32 McNemar’s test

The McNemar’s test (also known as the paired or matched chi-square) is used to determine if there are differences on a dichotomous dependent variable between two related groups. It can be considered to be similar to the paired-samples t-test, but for a dichotomous rather than a continuous dependent variable. The McNemar’s test is used to analyze pretest-posttest study designs (observing categorical outcomes more than once in the same patient), as well as being commonly employed in analyzing matched pairs and case-control studies.

When we have finished this Chapter, we should be able to:

Learning objectives

Applying hypothesis testing
Investigate a change in proportion for paired data using the McNemar’s test
Interpret the results

32.1 Research question and Hypothesis Testing

We consider the data in asthma dataset. The dataset contains data from a survey of 86 children with asthma who attended a camp to learn how to self-manage their asthmatic episodes. The children were asked whether they knew (yes or not) how to manage their asthmatic episodes appropriately at both the start and completion of the camp.

In other words, was a significant change in children’s knowledge of asthma management between the beginning and completion of the health camp?

Null hypothesis and alternative hypothesis

$H_{0}$ : There was no change in children’s knowledge of asthma management between the beginning and completion of the health camp
$H_{1}$ : There was change in children’s knowledge of asthma management between the beginning and completion of the health camp

32.2 Packages we need

We need to load the following packages:

library(rstatix)
library(janitor)
library(modelsummary)
library(exact2x2)
library(here)
library(tidyverse)

32.3 Preparing the data

We import the data asthma in R:

library(readxl)
asthma <- read_excel(here("data", "asthma.xlsx"))

	know_begin	know_end
1	yes	yes
2	no	no
3	yes	no
4	no	no
5	no	no
6	no	no
7	yes	yes
8	no	yes
9	no	yes
10	yes	yes

Figure 32.1: Table with data from “asthma” file.

We inspect the data and the type of variables:

glimpse(asthma)

Rows: 86
Columns: 2
$ know_begin <chr> "yes", "no", "yes", "no", "no", "no", "yes", "no", "no", "y…
$ know_end   <chr> "yes", "no", "no", "no", "no", "no", "yes", "yes", "yes", "…

The dataset asthma includes 86 children with asthma (rows) and 2 columns, the character (<chr>) know_begin and the character (<chr>) know_end. Therefore, we consider the dichotomous dependent variable asthma knowledge (yes/no) between two time points, know_begin and know_end.

Both measurements know_begin and know_end should be converted to factors (<fct>) using the convert_as_factor() function as follows:

asthma <- asthma %>%
  convert_as_factor(know_begin, know_end)

glimpse(asthma)

Rows: 86
Columns: 2
$ know_begin <fct> yes, no, yes, no, no, no, yes, no, no, yes, no, no, yes, ye…
$ know_end   <fct> yes, no, no, no, no, no, yes, yes, yes, yes, yes, no, yes, …

32.4 Contigency table

We can obtain the cross-tabulation table of the two measurements for the children’s knowledge of asthma:

tb3 <- table(know_begin = asthma$know_begin, know_end = asthma$know_end)
tb3

          know_end
know_begin no yes
       no  27  29
       yes  6  24

Important

There is a basic difference between this table and the more common two-way table. In this case, the count represents the number of pairs, not the number of individuals.

We want to compare the proportion of children’s knowledge of asthma management at the beginning with the proportion of children’s knowledge of asthma management at the end. We can create a more informative table using the functions from janitor package for obtaining total percentages and marginal totals.

Table with total percentages and marginal totals

janitor
modelsummary

We can create an informative table using the functions from janitor package for obtaining total percentages and marginal totals:

total_tb2 <- asthma %>%
  tabyl(know_begin, know_end) %>%
  adorn_totals(c("row", "col")) %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns %>%
  adorn_title

knitr::kable(total_tb2)

	know_end
know_begin	no	yes	Total
no	31.4% (27)	33.7% (29)	65.1% (56)
yes	7.0% (6)	27.9% (24)	34.9% (30)
Total	38.4% (33)	61.6% (53)	100.0% (86)

The contingency table using the datasummary_crosstab() from the modelsummary package:

modelsummary::datasummary_crosstab(know_begin ~ know_end, 
                     statistic = 1 ~ 1 + N + Percent(), 
                     data = asthma)

know_begin		no	yes	All
no	N	27	29	56
	%	31.4	33.7	65.1
yes	N	6	24	30
	%	7.0	27.9	34.9
All	N	33	53	86
	%	38.4	61.6	100.0

The proportion of children who knew to manage asthma at the beginning is (6+24)/86= 30/86 = 0.349 or 34.9%. The proportion of children who knew to mange asthma at the end is (29+24)/86 = 53/86 = 0.616 or 61.6%.

Assumption

The basic assumption of the test is that the sum of the discordant cells should be larger than 25 (that is fulfilled in our example).

32.5 Run McNemar’s test

Finally, we run the McNemar’s test:

McNemar’s test

Base R
rstatix

mcnemar.test(tb3)


    McNemar's Chi-squared test with continuity correction

data:  tb3
McNemar's chi-squared = 13.829, df = 1, p-value = 0.0002003

mcnemar_test(tb3)

# A tibble: 1 × 6
      n statistic    df      p p.signif method      
* <int>     <dbl> <dbl>  <dbl> <chr>    <chr>       
1    86      13.8     1 0.0002 ***      McNemar test

The proportion of children who knew to manage asthma at the end (61.6%) is significant larger compared with the proportion of children who knew to manage asthma at the beginning (34.9%) (p-value <0.001).

32.6 Exact binomial test

Exact binomial test for 2x2 table when the sum of the discordant cells are less than 25:

mcnemar.exact(tb3)


    Exact McNemar test (with central confidence intervals)

data:  tb3
b = 29, c = 6, p-value = 0.0001168
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  1.971783 14.238838
sample estimates:
odds ratio 
  4.833333