12 Inference for proportions

12.1 Inference for a single proportion

12.1.1 Confidence interval for a single proportion

12.1.1.1 Calculating ‘by hand’ using R

One approach is to calculate the normal approximation of the binomial distribution (Wald interval), as described in the OpenIntro book. This is basically the same as doing the ‘manual’ calculation, but in R.

As an example, we use a survey of voters in which 23% indicate to support the Conservatives. We estimate a 95% confidence interval for this estimate:

# Define input (change these you appropriate values)
p_hat = 0.23
n = 1000
confidence = 0.95


# Perform calculations (no need to change anything here)
se = sqrt(p_hat * (1 - p_hat)/ n)     # Calculate SE
z_star = qnorm((1 + confidence) / 2)  # Calculate z*-value 
lower = p_hat - z_star * se           # Lower bound CI
upper = p_hat + z_star * se           # Upper bound CI
c(lower, upper)

[1] 0.203917 0.256083

12.1.1.2 Using summary data

When you have only summary data, i.e. information about the sample size and sample proportion, we recommend using prop.test for calculating a confidence interval for a single proportion.[^3] The function prop.test is part of the package stats, which is one of the few packages that R automatically loads when it starts.

# Define input
p_hat = 0.23
n = 1000

# Run the actual test
prop.test(x = p_hat * n, 
          n = n, 
          conf.level = 0.95)


    1-sample proportions test with continuity correction

data:  p_hat * n out of n, null probability 0.5
X-squared = 290.52, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2045014 0.2576046
sample estimates:
   p 
0.23

Note: The results from the calculation by hand and from the function prop.test are slightly different. This is due to the fact that prop.test uses a slightly more complicated formula for calculating the confidence interval (Wilson score interval), also incorporating a ‘continutiy correction’ (which accounts for the fact that we are using a continuous normal distribution to approximate a discrete phenomenon, i.e. the number of successes).

prop.test: This is the function that conducts a test for a proportion and its confidence interval.
x = p_hat * n: The first value specified should be the number of successes. In our example his is the number of people supporting the Conservatives. We can calculate this as the sample proportion ($\hat{p}$) times the sample size.
n = n: The second value specified should be the number of trials (read: the number of observations in our dataset). In our case: the number of respondents in the dataset.
conf.level = 0.95: This determines the confidence level. The default is 0.95 which corresponds to a 95% confidence interval.

12.1.1.3 Variables in a data frame

If you have a data frame with a variable that represents successes (or not) for each case, we can use the following procedure. In our example we have a variable that records the vote intention of a respondent, which is either Conservative or Other party:

# For this example, we create a dataset of 1000 respondents with 230 expressing support for the Conservatives
example_data = data.frame(party_choice = factor(c(rep("Conservatives", 230), 
                                                  rep("Other party", 770))))
table(example_data$party_choice)


Conservatives   Other party 
          230           770

If we have data like this, we can directly calculate the confidence interval by running prop.test for the table:

prop.test(table(example_data$party_choice),
          conf.level = 0.95)

table(example_data$party_choice)

The first argument is a table of our variable of interest, which we can create by this code. If you use your own data, replace example_data with the name of your data frame and party_choice with the name of the variable of interest.¹

Note: The table should include only two categories. R will calculate the confidence interval for the first category in the table (in our case: Conservatives).²

conf.level = 0.95

This determines the confidence level. The default is 0.95 which corresponds to a 95% confidence interval.


    1-sample proportions test with continuity correction

data:  table(example_data$party_choice), null probability 0.5
X-squared = 290.52, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2045014 0.2576046
sample estimates:
   p 
0.23

12.1.2 Hypothesis tests for a single proportion

12.1.2.1 Summary data

When you only have information about the sample size and the proportion, we recommend using prop.test for conducting a hypothesis test.³

# Define input
p_hat = 0.23
n = 1000

# Run the actual test
prop.test(x = p_hat * n, 
          n = n, 
          p = 0.25,
          alternative = "two.sided")


    1-sample proportions test with continuity correction

data:  p_hat * n out of n, null probability 0.25
X-squared = 2.028, df = 1, p-value = 0.1544
alternative hypothesis: true p is not equal to 0.25
95 percent confidence interval:
 0.2045014 0.2576046
sample estimates:
   p 
0.23

prop.test: This is the function that conducts a test for a proportion.
x = p_hat * n: The first value specified should be the number of successes. We can calculate this as the sample proportion ($\hat{p}$) times the sample size.
n = n: The second value specified should be the number of trials (read: the number of observations in our dataset). In our case: the number of respondents in the dataset.
p = 0.25: This specifies the hypothesized null value. In this example this is 0.25 or 25%, so we are testing the null hypothesis $\mu_0 = 0.25$
alternative = "two.sided": Determine whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when $H_1: \mu < p$) or “greater” (when $H_1: \mu > p$).

12.1.2.2 Variables in a data frame

If you have a data frame with a variable that represents successes (or not) for each case, we can input this data directly into prop.test. In our example we have a variable that records the vote intention of a respondent, which is either Conservative or Other party (for the preparation of the data see [Single proportion confidence intervals for variables in a data frame]).

prop.test(table(example_data$party_choice),
          p = 0.25,
          alternative = "two.sided")

table(example_data$party_choice)

Note: The table should include only two categories. R will perform the test that the proportion for the first category is equal to the hypothesized value (p, see below).

p = 0.25

This specifies the hypothesized null value. In this example this is 0.25 or 25%, so we are testing the null hypothesis $\mu_0 = 0.25$

alternative = "two.sided"

Determine whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when $H_1: \mu < p$) or “greater” (when $H_1: \mu > p$).


    1-sample proportions test with continuity correction

data:  table(example_data$party_choice), null probability 0.25
X-squared = 2.028, df = 1, p-value = 0.1544
alternative hypothesis: true p is not equal to 0.25
95 percent confidence interval:
 0.2045014 0.2576046
sample estimates:
   p 
0.23

12.2 Reporting the results from a hypothesis test for a single proportion

#Obtain z-value based on manual calculation

# Define input
p_hat = 0.12
n = 1500
p = 0.10

z <- (p_hat-p)/(sqrt((p*(1-p)/n)))
z

[1] 2.581989

2*pnorm(q=z, lower.tail=FALSE)

[1] 0.009823275

# Run the actual test
prop.test(x = p_hat * n, 
          n = n, 
          p = 0.10,
          alternative = "two.sided",
          correct = FALSE)


    1-sample proportions test without continuity correction

data:  p_hat * n out of n, null probability 0.1
X-squared = 6.6667, df = 1, p-value = 0.009823
alternative hypothesis: true p is not equal to 0.1
95 percent confidence interval:
 0.1045180 0.1374233
sample estimates:
   p 
0.12

Output explanation

the probability of finding the observed ($z$ or $\chi^2$, given the null hypothesis (p-value) is given as follows:
- in R, see p-value = 0.009823. Note: If the value is very close to zero, R may use the scientific notation for small numbers (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022). In these cases, write p < 0.001 in your report.

12.2.0.1 Reporting

If you calculate the results by hand and obtain a z-score, then the correct report includes:

A conclusion about the null hypothesis; followed by
Information about the sample percentage and the population percentage.
$z$ = the value of z.
p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
- in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.

If you calculate the results using prop.test and obtain a $\chi^2$-score, then the correct report includes:

A conclusion about the null hypothesis; followed by
Information about the sample percentage and the population percentage.
$\chi^2$ = the value of chi-square, followed by the degrees of freedom in brackets.
p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
- in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.

Report

✓ The percentage of supporters found in our sample (N = 1500) who support the Socialist Party (12%) is statistically significantly different from the percentage of the entire population (10%), z=2.58, p=0.1394.

12.3 Comparing two proportions

Last week, we calculated a confidence interval for a single proportion. This week we cover two (independent) proportions. The calculations below apply to the case where the two proportions are drawn from different samples or subsamples, for example the support for a party in two separate opinion polls or the left-right position of respondents who live in the capital city and those who do not live there (these groups are independent because you can only belong to one).

12.3.1 Confidence interval for the difference between two proportions

We use the CPR data set from the openintro package (see p. 218). The data set contains data on patients who were randomly divided into a treatment group where they received a blood thinner or the control group where they did not receive a blood thinner. The outcome variable of interest was whether the patients survived for at least 24 hours.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(openintro)

Warning: package 'openintro' was built under R version 4.4.3

Loading required package: airports

Warning: package 'airports' was built under R version 4.4.3

Loading required package: cherryblossom

Warning: package 'cherryblossom' was built under R version 4.4.3

Loading required package: usdata

Warning: package 'usdata' was built under R version 4.4.3

library(flextable)


Attaching package: 'flextable'

The following object is masked from 'package:purrr':

    compose

data(cpr)

# Refactor variables to ensure the order is the same as in the textbook table
cpr <- cpr |>
  mutate(group = factor(group, levels = c("control", "treatment"), ordered = TRUE),
         outcome = factor(outcome, levels = c("survived", "died"), ordered = TRUE))

table_example <- proc_freq(x = cpr, 
                           row = "group", 
                           col = "outcome", 
                           include.row_percent = FALSE, 
                           include.column_percent = FALSE, 
                           include.table_percent = FALSE) 
table_example

group	outcome
group	survived	died	Total
control	11	39	50
treatment	14	26	40
Total	25	65	90

Note that we usually put the independent variable in the columns. However, the book does not follow this convention.

12.3.1.1 Calculating ‘by hand’ using R

To calculate a 90 % confidence interval of the difference for the survival rates ($p_{1}$ and $p_{2}$) by hand, we can write:

# Define input (change these according to your values)
p_hat_1 = 14 / 40  # Treatment group
p_hat_2 = 11 / 50  # Control group
n_1 = 40
n_2 = 50
confidence = 0.9

# Perform calculations (no need to change anything here)
p_hat = p_hat_1 - p_hat_2
se = sqrt((p_hat_1*(1-p_hat_1)/n_1) + (p_hat_2*(1-p_hat_2)/n_2))
z_star = qnorm((1 + confidence) / 2)  # Calculate z*-value 
lower = p_hat - z_star * se   # Lower bound CI
upper = p_hat + z_star * se   # Upper bound CI
c(lower, upper)

[1] -0.02707706  0.28707706

Note: we follow the order of the book and take p1 as the percentage for the treatment group and p2 as the percentage for the control group.

12.3.1.2 Summary data

When you have only summary data, i.e. information about the sample size and sample proportion, we recommend using prop.test for calculating a confidence interval for the difference between proportions in two samples. The function prop.test is part of the package stats, which is one of the few packages that R automatically loads when it starts.

prop.test(x = c(14, 11),
          n = c(40, 50),
          conf.level = 0.90,
          correct=FALSE)

prop.test: This is the function that conducts a test for proportions.
x = c(14, 11): Here we specify the number of ‘successes’ for each group (in this case the number of survivors in each group).
n = c(40, 50): The second value specified should be the number of trials for each of the two samples (read: the number of observations in our dataset). In our case: the number of participants in both groups.
conf.level = 0.90: This determines the confidence level. The default is 0.95 which corresponds to a 95% confidence interval. We set the interval to 0.90.
correct = FALSE: This sets the continuity correction to FALSE. This should provide the same results as the manual calculations above.


    2-sample test for equality of proportions without continuity correction

data:  c(14, 11) out of c(40, 50)
X-squared = 1.872, df = 1, p-value = 0.1712
alternative hypothesis: two.sided
90 percent confidence interval:
 -0.02707706  0.28707706
sample estimates:
prop 1 prop 2 
  0.35   0.22

12.3.2 Hypothesis test for the difference between two proportions

We use the mammogram data set from the openintro package. The data set contains data from an experiment where 89,835 women were randomized to either get a mammogram or a non-mammogram breast screening. The response measured was whether they had died from breast cancer within 25 years.

library(openintro)
library(flextable) 
data(mammogram)

# Refactor variables to control the order of categories in the cross table
mammogram <- mammogram |>
  mutate(breast_cancer_death = factor(breast_cancer_death, levels = c("yes", "no"), ordered=TRUE)) |>
  mutate(treatment = factor(treatment, levels = c("mammogram", "control"), ordered = TRUE))

table_example <- proc_freq(x = mammogram, 
                           row = "treatment", 
                           col = "breast_cancer_death", 
                           include.row_percent = FALSE, 
                           include.column_percent = FALSE, 
                           include.table_percent = FALSE) 
table_example

treatment	breast_cancer_death
treatment	yes	no	Total
mammogram	500	44,425	44,925
control	505	44,405	44,910
Total	1,005	88,830	89,835

This table is the same as Table 6.2 on p. 219.

12.3.2.1 Calculation by hand using R

To test the hypothesis whether there was a difference in breast cancer deaths in the two groups by hand, we can write:

# Define input (change these according to your values)
p_hat_1 = 500 / (500 + 44425)
p_hat_2 = 505 / (505 + 44405)
n_1 = 500 + 44425 #this is the total number of subjects in the first group
n_2 = 505 + 44405 #this is the total number of subjects in the second group
null_value = 0

# Perform calculations (no need to change anything here)
p_hat_pooled = (p_hat_1 * n_1 + p_hat_2 * n_2) / (n_1 + n_2)
point_est = p_hat_1 - p_hat_2
se = sqrt((p_hat_pooled*(1-p_hat_pooled)/n_1)+(p_hat_pooled*(1-p_hat_pooled)/n_2))
z = (point_est - null_value)/se
pnorm(z)

[1] 0.434892

The lower tail area is 0.4349 (small difference compared to the book due to rounding). The p-value, represented by both tails together, is 0.434892*2 = 0.869784. Because this value $p > 0.05$, we do not reject the null hypothesis.

We can visualize this as follows:

library(visualize)
visualize.norm(stat = c(-z, z), section = "tails")

12.3.2.2 Summary data

Alternatively, we can use the R function prop.test() to test for the difference between the two groups. We used this function in week 4 already for the calculation of a single proportion. The function prop.test is part of the package stats, which is one of the few packages that R automatically loads when it starts. For our purpose, we use it as follow:

result <- prop.test(x = c(500, 505), 
                    n = c(44925, 44910),
                    alternative = "two.sided",
                    correct = FALSE)
result


    2-sample test for equality of proportions without continuity correction

data:  c(500, 505) out of c(44925, 44910)
X-squared = 0.026874, df = 1, p-value = 0.8698
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.001490590  0.001260488
sample estimates:
    prop 1     prop 2 
0.01112966 0.01124471

prop.test(: This is the function that conducts a test for a proportion.
x = c(500, 505),: Here we indicate the number of cases that are in our two groups. In this example, 500 patients who received a mammogram and died and 505 patients who did not receive a mammogram and died. Because these are two values, we combine them using ‘c()’.
n = c(44925, 44910),: The second value specified should be the total number of trials. In our case: the total number of patients in the two groups. Because these are two values, we combine them using ‘c()’.
alternative = "two.sided": Determine whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when $H_1: \mu < p$) or “greater” (when $H_1: \mu > p$).
correct = FALSE): With this, we indicate that we do not want to calculate the confidence interval using a ‘continuity correction’. By default, this value is set to ‘TRUE’.

12.3.2.3 Data in a data frame

If you have a data frame with a variable that, for each case, represents one of two possible outcomes (in statistical terms “success” and “failure”), we can use prop.test.

prop.test(table(mammogram$treatment, mammogram$breast_cancer_death), 
          alternative = "two.sided",
          correct = FALSE)


    2-sample test for equality of proportions without continuity correction

data:  table(mammogram$treatment, mammogram$breast_cancer_death)
X-squared = 0.026874, df = 1, p-value = 0.8698
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.001490590  0.001260488
sample estimates:
    prop 1     prop 2 
0.01112966 0.01124471

prop.test(table(mammogram$treatment, mammogram$breast_cancer_death),: This is the function that conducts a test for a proportion. We select the independent variable (mammogram$treatment) and dependent variable (mammogram$breast_cancer_death). Note that the order of the variables matters for the confidence interval: so list the independent first and then the dependent variable.
alternative = "two.sided": Determine whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when $H_1: \mu < p$) or “greater” (when $H_1: \mu > p$).
correct = FALSE: With this, we indicate that we do not want to calculate the confidence interval using a ‘continuity correction’. By default, this value is set to ‘TRUE’.

The results are identical to the example above that used the summary data.

12.4 Reporting the results from a hypothesis test for the difference in two proportions

# Define input (change these according to your values)
p_hat_1 = 70 / 330
p_hat_2 = 90 / 470
n_1 = 330 #this is the total number of subjects in the first group
n_2 = 470 #this is the total number of subjects in the second group
null_value = 0

# Perform calculations (no need to change anything here)
p_hat_pooled = (p_hat_1 * n_1 + p_hat_2 * n_2) / (n_1 + n_2)
point_est = p_hat_1 - p_hat_2
se = sqrt((p_hat_pooled*(1-p_hat_pooled)/n_1)+(p_hat_pooled*(1-p_hat_pooled)/n_2))
z = (point_est - null_value)/se
z

[1] 0.7181896

2*pnorm(q=z, lower.tail=FALSE)

[1] 0.4726404

# Results using prop.test
prop.test(x = c(70, 90), 
                    n = c(330, 470),
                    alternative = "two.sided",
                    correct = FALSE)


    2-sample test for equality of proportions without continuity correction

data:  c(70, 90) out of c(330, 470)
X-squared = 0.5158, df = 1, p-value = 0.4726
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.03603276  0.07729646
sample estimates:
   prop 1    prop 2 
0.2121212 0.1914894

Output explanation

the probability of finding the observed ($z$ or $\chi^2$, given the null hypothesis (p-value) is given as follows:
- in R, see p-value = 0.009823. Note: If the value is very close to zero, R may use the scientific notation for small numbers (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022). In these cases, write p < 0.001 in your report.

12.4.0.1 Reporting

If you calculate the results by hand and obtain a z-score, then the correct report includes:

A conclusion about the null hypothesis; followed by
Information about the sample percentage and the population percentage.
$z$ = the value of z.
p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
- in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.

If you calculate the results using prop.test and obtain a $\chi^2$-score, then the correct report includes:

A conclusion about the null hypothesis; followed by
Information about the sample percentage and the population percentage.
$\chi^2$ = the value of chi-square, followed by the degrees of freedom in brackets.
p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
- in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.

Report

✓ There is no statistically significant difference between the percentage of citizens who were somewhat dissatisfied with democracy before the election (21.21%, n = 330) and after the election (19.14%, n = 470), z = 0.72, p = 0.47.

If you really like pipes, you can also use: example_data |> pull(party_choice) |> table() .↩︎

If you would like to calculate the confidence interval for the second category, you can use, for example, relevel to re-order the categories of the first variable:

prop.test(table(relevel(example_data$party_choice, ref = "Other party")),
          conf.level = 0.95)


    1-sample proportions test with continuity correction

data:  table(relevel(example_data$party_choice, ref = "Other party")), null probability 0.5
X-squared = 290.52, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.7423954 0.7954986
sample estimates:
   p 
0.77

↩︎

This uses a slightly more complicated formula for calculating the confidence interval (Wilson score interval), also incorporating a ‘continutiy correction’ (which accounts for the fact that we are using a continuous normal distribution to approximate a discrete phenomenon, i.e. the number of successes)↩︎