# 26 One-way Repeated Measures ANOVA

The one-way repeated measures analysis of variance (also known as a within-subjects ANOVA) is an extension of the paired t-test designed to assess whether there are significant differences in the means of three or more related groups, such as comparing the difference between three or more time points.

## 26.1 Research question and Hypothesis Testing

Participants used margarine for 12 weeks. Their blood total cholesterol (TCH; in mmol/L) was measured before the special diet, after 6 weeks and after 12 weeks.

## 26.2 Packages we need

We need to load the following packages:

## 26.3 Preparing the data

We import the data *cholesterol* in R:

```
library(readxl)
dat_TCH <- read_excel(here("data", "cholesterol.xlsx"))
```

We inspect the data and the type of variables:

`glimpse(dat_TCH)`

```
Rows: 18
Columns: 3
$ week0 <dbl> 6.42, 6.76, 6.56, 4.80, 8.43, 7.49, 8.05, 5.05, 5.77, 3.91, 6.7…
$ week6 <dbl> 5.83, 6.20, 5.83, 4.27, 7.71, 7.12, 7.25, 4.63, 5.31, 3.70, 6.1…
$ week12 <dbl> 5.75, 6.13, 5.71, 4.15, 7.67, 7.05, 7.10, 4.67, 5.33, 3.66, 5.9…
```

## 26.4 Assumptions

**A. Explore the characteristics of distribution for each time point and check for normality**

The distributions can be explored visually with appropriate plots. Additionally, summary statistics and significance tests to check for normality (e.g., Shapiro-Wilk test) can be used.

**Graphs**

We can visualize the distribution of `cholesterol`

for the three time points:

```
dat_TCH_long <- dat_TCH |>
mutate(id = row_number()) |>
pivot_longer(cols = -id, names_to = "time", values_to = "cholesterol") |>
mutate(time = factor(time, levels = c("week0", "week6", "week12")))
```

```
ggplot(dat_TCH_long, aes(x= time, y = cholesterol, fill = time)) +
geom_rain(likert= TRUE, seed = 123, point.args = list(alpha = 0.3)) +
#theme_prism(base_size = 14, base_line_size = 0.4, palette = "office") +
labs(title = "Grouped Raincloud Plot: Cholesterol by time point") +
scale_fill_jco() +
theme(legend.position = "none")
```

```
ggqqplot(dat_TCH_long, "cholesterol", color = "time", conf.int = F) +
#theme_prism(base_size = 14, base_line_size = 0.4, palette = "office") +
scale_color_jco() +
facet_wrap(~ time) +
theme(legend.position = "none")
```

The above figures show that the data are close to symmetry and the assumption of a normal distribution is reasonable.

**Summary statistics**

The `cholesterol`

summary statistics for each time point are:

```
cholesterol_summary <- dat_TCH_long %>%
group_by(time) %>%
dplyr::summarise(
n = n(),
na = sum(is.na(cholesterol)),
min = min(cholesterol, na.rm = TRUE),
q1 = quantile(cholesterol, 0.25, na.rm = TRUE),
median = quantile(cholesterol, 0.5, na.rm = TRUE),
q3 = quantile(cholesterol, 0.75, na.rm = TRUE),
max = max(cholesterol, na.rm = TRUE),
mean = mean(cholesterol, na.rm = TRUE),
sd = sd(cholesterol, na.rm = TRUE),
skewness = EnvStats::skewness(cholesterol, na.rm = TRUE),
kurtosis= EnvStats::kurtosis(cholesterol, na.rm = TRUE)
) %>%
ungroup()
cholesterol_summary
```

```
# A tibble: 3 × 12
time n na min q1 median q3 max mean sd skewness kurtosis
<fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 week0 18 0 3.91 5.74 6.5 7.22 8.43 6.41 1.19 -0.311 -0.248
2 week6 18 0 3.7 5.18 5.83 6.73 7.71 5.84 1.12 -0.171 -0.708
3 week… 18 0 3.66 5.21 5.73 6.69 7.67 5.78 1.10 -0.188 -0.561
```

```
dat_TCH_long |>
group_by(time) |>
dlookr::describe(cholesterol) |>
select(described_variables, time, n, mean, sd, p25, p50, p75, skewness, kurtosis) |>
ungroup() |>
print(width = 100)
```

```
Registered S3 methods overwritten by 'dlookr':
method from
plot.transform scales
print.transform scales
```

```
# A tibble: 3 × 10
described_variables time n mean sd p25 p50 p75 skewness
<chr> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 cholesterol week0 18 6.41 1.19 5.74 6.5 7.22 -0.311
2 cholesterol week6 18 5.84 1.12 5.18 5.83 6.73 -0.171
3 cholesterol week12 18 5.78 1.10 5.21 5.73 6.69 -0.188
kurtosis
<dbl>
1 -0.248
2 -0.708
3 -0.561
```

The means are close to medians and the standard deviations are also similar. Moreover, both skewness and (excess) kurtosis falls into the acceptable range of [-1, 1] indicating approximately normal distributions for all time points.

**Normality test**

```
dat_TCH_long |>
group_by(time) |>
shapiro_test(cholesterol) |>
ungroup()
```

```
# A tibble: 3 × 4
time variable statistic p
<fct> <chr> <dbl> <dbl>
1 week0 cholesterol 0.982 0.967
2 week6 cholesterol 0.977 0.912
3 week12 cholesterol 0.977 0.918
```

The tests of normality suggest that the data for the `cholesterol`

in all time points are normally distributed (p > 0.05).

In our example, the data at each time point are approximately normally distributed; a repeated ANOVA analysis can be performed.

**B. Sphericity test for equality of variances of the differences**

In addition, the assumption of sphericity must be met for accurate interpretation of the results of repeated ANOVA. This assumption is usually checked with the **Mauchly’s sphericity test**, wherein null hypothesis states that the variances of the differences are equal.

`MauchlySphericityTest(dat_TCH)`

`[1] 0.0004439621`

Here the assumption of sphericity has not been met (p < 0.001). In this case, we have to correct the degrees of freedom in repeated ANOVA analysis.

## 26.5 Run the one-way repeated ANOVA test

First, let’s run the repeated ANOVA test without any correction:

```
repeated_anova <- dat_TCH_long |>
anova_test(dv = cholesterol, wid = id, within = time)
repeated_anova[1]
```

```
$ANOVA
Effect DFn DFd F p p<.05 ges
1 time 2 34 212.321 6.17e-20 * 0.061
```

However, we must adjust the results of the repeated ANOVA analysis. The correction involves multiplying the degrees of freedom DFn=2 and DFd=34 by a quantity *e*, which measures the extent to which the data deviates from ideal sphericity (*e* ranges between 0 and 1, where 1 indicates no departure from sphericity). Two methods are commonly used for calculating *e*:

- the correction of Greenhouse-Geisser (GGe).

- the correction of Huynh-Feldt (HFe).

In R, we can calculate both GGe and HFe, as follows:

`repeated_anova[3]`

```
$`Sphericity Corrections`
Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
1 time 0.618 1.24, 21 3.89e-13 * 0.642 1.28, 21.82 1.44e-13 *
```

The general recommendation is to use the Greenhouse-Geisser correction when GGe is less than 0.75; otherwise, we should use the Huynh-Feldt correction HFe.

As the GGe value is less than 0.75, we use the Greenhouse-Geisser adjustment of 0.618. The corrected degrees of freedom are:

\(DF[GG]_n=2*0.618=1.24\)

and

\(DF[GG]_d=34*0.618=21\).

The new p-value (p[GG]) is available next to the corrected degrees of freedom (DF[GG]). In this example, as p<0.001 there is evidence of a difference between at least two time points.

By utilizing the `get_anova_table()`

function to extract the ANOVA table, the Greenhouse-Geisser sphericity correction is automatically applied when the assumption of sphericity is violated.

`get_anova_table(repeated_anova)`

```
ANOVA Table (type III tests)
Effect DFn DFd F p p<.05 ges
1 time 1.24 21 212.321 3.89e-13 * 0.061
```