22 Paired t-test
A paired t-test is used to estimate whether the means of two related measurements are significantly different from one another.
Examples of paired study designs are:
- measurements collected before and after an intervention in an experimental study
- twins, husbands and wives, brothers and sisters, pairs of eyes
- matched cases and controls
- a cross-over trial in which each patient has two measurements on the variable, one while taking active treatment and one while taking placebo
When we have finished this Chapter, we should be able to:
22.1 Research question
The dataset weight contains the birth and discharge weight of 25 newborns. We might ask if the mean difference of the weight in birth and in discharge equals to zero or not. If the differences between the pairs of measurements are normally distributed, a paired t-test is the most appropriate statistical test.
22.2 Packages we need
We need to load the following packages:
22.3 Preparing the data
We import the data weight in R:
library(readxl)
weight <- read_excel(here("data", "weight.xlsx"))
We calculate the differences using the mutate()
function:
weight <- weight |>
mutate(dif_weight = discharge_weight - birth_weight)
We inspect the data:
glimpse(weight)
Rows: 25
Columns: 3
$ birth_weight <dbl> 3250, 2680, 2960, 3420, 3210, 2740, 3250, 3170, 2970,…
$ discharge_weight <dbl> 3220, 2640, 2940, 3350, 3140, 2730, 3220, 3150, 2890,…
$ dif_weight <dbl> -30, -40, -20, -70, -70, -10, -30, -20, -80, -103, -8…
22.4 Assumptions
22.5 Explore the characteristics of distribution of differences
The distribution of the differences can be explored with appropriate plots and summary statistics.
Graph
We can explore the distribution of differences visually for symmetry with a density plot (a smoothed version of the histogram):
weight |>
ggplot(aes(x = dif_weight)) +
geom_density(fill = "#76B7B2", color="black", alpha = 0.2) +
geom_vline(aes(xintercept=mean(dif_weight)),
color="blue", linetype="dashed", size=1.4) +
geom_vline(aes(xintercept=median(dif_weight)),
color="red", linetype="dashed", size=1.2) +
labs(x = "Weight difference") +
theme_minimal() +
theme(plot.title.position = "plot")
The above figure shows that the data are following an approximately symmetrical distribution. Note that the arithmetic mean (blue vertical dashed line) is very close to the median (red vertical dashed line) of the data.
Summary statistics
Summary statistics can also be calculated for the variables.
We can utilize the across
function to obtain the results across the three variables simultaneously:
summary_weight <- weight |>
dplyr::summarise(across(
.cols = c(dif_weight, birth_weight, discharge_weight),
.fns = list(
n = ~n(),
na = ~sum(is.na(.)),
min = ~min(., na.rm = TRUE),
q1 = ~quantile(., 0.25, na.rm = TRUE),
median = ~quantile(., 0.5, na.rm = TRUE),
q3 = ~quantile(., 0.75, na.rm = TRUE),
max = ~max(., na.rm = TRUE),
mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE),
skewness = ~EnvStats::skewness(., na.rm = TRUE),
kurtosis= ~EnvStats::kurtosis(., na.rm = TRUE)
),
.names = "{col}_{fn}")
)
# present the results
summary_weight <- summary_weight |>
mutate(across(everything(), \(x) round(x, 2))) |> # round to 3 decimal places
pivot_longer(1:33, names_to = "Stats", values_to = "Values") # long format
summary_weight|>
head(n = 11L)
# A tibble: 11 × 2
Stats Values
<chr> <dbl>
1 dif_weight_n 25
2 dif_weight_na 0
3 dif_weight_min -103
4 dif_weight_q1 -70
5 dif_weight_median -40
6 dif_weight_q3 -20
7 dif_weight_max 30
8 dif_weight_mean -39.6
9 dif_weight_sd 32.3
10 dif_weight_skewness 0.16
11 dif_weight_kurtosis -0.08
weight |>
dlookr::describe(dif_weight, birth_weight, discharge_weight) %>%
select(described_variables, n, na, mean, sd, p25, p50, p75, skewness, kurtosis) |>
ungroup() |>
print(width = 100)
Registered S3 methods overwritten by 'dlookr':
method from
plot.transform scales
print.transform scales
# A tibble: 3 × 10
described_variables n na mean sd p25 p50 p75 skewness
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dif_weight 25 0 -39.6 32.3 -70 -40 -20 0.157
2 birth_weight 25 0 3076. 248. 2960 3150 3210 -0.291
3 discharge_weight 25 0 3036. 248. 2880 3100 3220 -0.219
kurtosis
<dbl>
1 -0.0752
2 -0.935
3 -1.02
As it was previously mentioned, the mean of the differences (-39.64) is close to median (-40). Moreover, both the skewness and the kurtosis are approximately zero indicating a symmetric and mesokurtic distribution for the weight differences.
Normality test
Additionally, we can check the statistical test for normality of the differences.
weight |>
shapiro_test(dif_weight)
# A tibble: 1 × 3
variable statistic p
<chr> <dbl> <dbl>
1 dif_weight 0.974 0.742
The Shapiro-Wilk test suggests that the weight differences are normally distributed (p=0.74 > 0.05).
22.6 Run the paired t-test
We will perform a paired t-test to test the null hypothesis that the mean differences of weight equals to zero. Under this \(H_o\), the test statistic is:
\[ t = \frac{\bar d}{SE_{\bar d}} \]
where \(\bar d\) is the mean of the differences, \(SE_{\bar d} = s_d/ \sqrt n\) is the estimate of standard error and n is the number of pairs.
Under the null hypothesis, the t statistic follows the t-distribution with n-1 degrees of freedom (d.f.).
The 95% confidence interval of the mean of the differences is:
\[ 95\% \ CI = \bar d \pm t_{n-1;a/2} * SE_{\bar d} \]
In R:
First, we calculate the mean of the differences:
mean_dif <- mean(weight$dif_weight, na.rm = TRUE)
mean_dif
[1] -39.64
Second, we find the standard error of the mean of differences:
[1] 6.453134
Therefore, the t statistic is:
t <- mean_dif / st_error
t
[1] -6.142752
The corresponding probability for this value can be calculated as the cumulative probability P(T ≤ t) for n-1 degrees of freedom. Then, the p-value is 2*P(T ≤ t) because we are doing a two-tailed test.
P <- pt(t, df = 24)
p_value <- 2*P
p_value
[1] 2.401139e-06
The 95% confidence interval of the mean of differences is:
lower_CI <- mean_dif + qt(0.025, df = 24)*st_error
lower_CI
[1] -52.95861
upper_CI <- mean_dif - qt(0.025, df = 24)*st_error
upper_CI
[1] -26.32139
Note that the 95% confidence interval (-53.0 to -26.3) doesn’t include the null hypothesized value of zero.
Next, we present R functions to carry out all the tasks on our behalf.
Our data are in a wide format. However, we are going to use only the dif_weight
variable, inside the t.test()
:
t.test(weight$dif_weight)
One Sample t-test
data: weight$dif_weight
t = -6.1428, df = 24, p-value = 2.401e-06
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-52.95861 -26.32139
sample estimates:
mean of x
-39.64
t.test(weight$discharge_weight, weight$birth_weight, paired = T)
Paired t-test
data: weight$discharge_weight and weight$birth_weight
t = -6.1428, df = 24, p-value = 2.401e-06
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-52.95861 -26.32139
sample estimates:
mean difference
-39.64
weight |>
t_test(dif_weight ~ 1, detailed = T)
# A tibble: 1 × 12
estimate .y. group1 group2 n statistic p df conf.low conf.high
* <dbl> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -39.6 dif_w… 1 null … 25 -6.14 2.40e-6 24 -53.0 -26.3
# ℹ 2 more variables: method <chr>, alternative <chr>
22.6.1 Present the results
Summary table
Show the code
tb1 <- weight |>
mutate(id = row_number()) |>
select(-dif_weight) |>
pivot_longer(!id, names_to = "group", values_to = "weights")
tb1 |>
tbl_summary(by = group, include = -id,
label = list(weights ~ "weights (g)"),
statistic = weights ~ "{mean} ({sd})") %>%
add_difference(test = weights ~ "paired.t.test", group = id,
estimate_fun = weights ~ function(x) style_sigfig(x, digits = 3)) |>
modify_table_body(~.x |> dplyr::relocate(stat_2, .before = stat_1)) |>
modify_table_body(~.x |> mutate(estimate = -estimate,
ci = "-53.0, -26.3"))
Characteristic | discharge_weight, N = 251 | birth_weight, N = 251 | Difference2 | 95% CI2,3 | p-value2 |
---|---|---|---|---|---|
weights (g) | 3,036 (248) | 3,076 (248) | -39.6 | -53.0, -26.3 | <0.001 |
1 Mean (SD) | |||||
2 Paired t-test | |||||
3 CI = Confidence Interval |
Report the results
Effect sizes were labelled following Cohen's (1988) recommendations.
The Paired t-test testing the difference between weight$discharge_weight and
weight$birth_weight (mean difference = -39.64) suggests that the effect is
negative, statistically significant, and large (difference = -39.64, 95% CI
[-52.96, -26.32], t(24) = -6.14, p < .001; Cohen's d = -1.23, 95% CI [-1.74,
-0.70])
We can use the above information to write up a final report:
There was a significant reduction of 39.6 g in weight after the discharge (mean change = -39.6 g, sd = 32.31, 95% CI: [-52.96, -26.32]; p-value <0.001).
1 sd for the change is useful information for meta-analytic techniques (see Cochrane Handbook for Systematic Reviews of Interventions)
We concluded that the result is statistical significant (reject \(H_o\)). However, is this reduction of clinical importance?