14  T-tests for means

We use data from the 2019 Canadian Election Study as an example in this chapter:

canada <- import("2019 Canadian Election Study.rds")

14.1 Standard error of the mean

The standard error of the mean can be estimated as \(\frac{s}{\sqrt{n}}\). In R we can use the function MeanSE from package DescTools:

library(DescTools)
example_vector <- rnorm(20)  # Example data of 20 normally distributed random numbers.

MeanSE(x = example_vector, 
       na.rm = TRUE)
[1] 0.1487493
MeanSE(…)

This function calculates the standard error of the mean for a vector of numbers.

x = example_vector

We want to calculate the SE of the Mean for the values in example_vector. For your own data, you would change this to the appropriate vector name.

na.rm = TRUE

This option ensures that any missing values are ignored.

You can also use MeanSE for a variable in a data frame:

# Using $ to select a variable from a data frame
MeanSE(canada$cps19_age, na.rm = TRUE)  
[1] 0.08541655
# Using function summarise
canada |>
  summarise(SE_age = MeanSE(cps19_age, na.rm = TRUE))
      SE_age
1 0.08541655

The describe function in package psych provides the Standard Error for the Mean for all variables in a data frame. It is particularly useful when we would like to quickly calculate the standard error for multiple variables. In this example we select three interval-ratio variables from the 2019 Canadian Election Study (which we imported above):

library(psych)

# Select three variables from 'canada' and assign to dataset 'canada_selection'
canada_selection <- canada |> 
  select(cps19_age, cps19_lr_parties_1, cps19_lr_parties_2)

# Use describe to calculate summary statistics for 'canada_selection'
describe(canada_selection)
                   vars     n  mean    sd median trimmed   mad min max range
cps19_age             1 37822 48.69 16.61     49   48.66 20.76  18  99    81
cps19_lr_parties_1    2 27743  4.27  2.79      4    4.19  2.97   0  10    10
cps19_lr_parties_2    3 28210  6.90  2.79      8    7.28  2.97   0  10    10
                    skew kurtosis   se
cps19_age           0.04    -0.91 0.09
cps19_lr_parties_1  0.19    -0.77 0.02
cps19_lr_parties_2 -0.97     0.14 0.02

The column se displays the standard error of the mean (by default rounded to two decimals).

14.2 T-tests

14.2.1 One sample t-test

To calculate a one sample t-test we use the function t.test:

In the below example we measure whether the age of respondents is different from a hypothesized mean of 48.5:1

t.test(formula = cps19_age ~ 1,
       data = canada,
       alternative = "two.sided",
       mu = 48.5,
       conf.level = 0.95)

    One Sample t-test

data:  cps19_age
t = 2.2395, df = 37821, p-value = 0.02513
alternative hypothesis: true mean is not equal to 48.5
95 percent confidence interval:
 48.52387 48.85871
sample estimates:
mean of x 
 48.69129 
formula = cps19_age ~ 1

As we have only one variable in a one sample t-test, we specify the formula in the form <variable name> ~ 1.

data = canada

We specify the data frame that we want to use.

alternative = "two.sided"

Determines whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when \(H_1: \mu < p\)) or “greater” (when \(H_1: \mu > p\)).

mu = 48.5

The mu parameter should be set to the value of the mean under the null hypothesis.

conf.level = 0.95

This specifies the confidence level of the confidence interval reported. The default is 0.95 (a 95% confidence interval).

14.2.2 Reporting a one-sample t-test

We illustrate the one-sample t-test using data on the fruit consumption of students. We test whether the fruit consumption is significantly different from a population value of 100.

Example output R (do not include the output directly in an academic paper):

MeanSE(fruit$fruitconsumption)
[1] 1.714986
t.test(x = fruit$fruitconsumption,
           alternative = "two.sided",
           mu = 100,
           conf.level = 0.95)

    One Sample t-test

data:  fruit$fruitconsumption
t = -5.831, df = 33, p-value = 1.587e-06
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 86.51084 93.48916
sample estimates:
mean of x 
       90 
Output explanation
  • In R, the output shows:
    • the mean of the observations in the sample (see ‘mean of x’).
    • the calculated t-value (see ‘t =’).
    • the degrees of freedom (see ‘df =’).
    • the p-value (i.e. the probability of finding the obtained t-value, given the null hypothesis).
    • the expected value under the null hypothesis (here 100, see “true mean is not equal to 100”).
    • the standard error of the observations in the sample is obtained separately via MeanSE().

14.2.2.1 Reporting

The correct report includes:

  • A conclusion about the null hypothesis; followed by
  • the mean (in the text or as M = …) and the standard error (SE = …) of the sample.
  • t(degrees of freedom)
  • p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
    • in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.
  • If you calculate the t-value by hand you would write p < / = / > your chosen \(\alpha\)-level.
Report

✓ The mean fruit consumption of students was 90 (SE = 1.71). This is significantly different from the mean fruit consumption of the whole population (t(33) = -5.83, p < 0.001).

Below you find an example of a non-significant result. In this case we test whether the fruit consumption is significantly different from a population value of 92.

Example output R (do not include the output directly in an academic paper):

MeanSE(fruit$fruitconsumption, na.rm = TRUE)
[1] 1.714986
t.test(x = fruit$fruitconsumption,
           alternative = "two.sided",
           mu = 92,
           conf.level = 0.95)

    One Sample t-test

data:  fruit$fruitconsumption
t = -1.1662, df = 33, p-value = 0.2519
alternative hypothesis: true mean is not equal to 92
95 percent confidence interval:
 86.51084 93.48916
sample estimates:
mean of x 
       90 
Report

✓ The mean fruit consumption of students was 90 (SE = 1.71). This is not significantly different from the mean fruit consumption of the whole population (t(33) = -1.167, p = 0.252).

14.2.3 Paired samples t-test

In a paired samples t-test we compare the mean value of two interval-ratio variables.

In the below example we test the null hypothesis that the mean difference of the left-right placement of the Liberal party (variable cps_lr_parties_1) and the Conservative party (variable cps19_lr_parties_2) is equal to 0 in the population.

We can specify the test as follows:2

t.test(formula = Pair(cps19_lr_parties_1, cps19_lr_parties_2) ~ 1,
       data = canada,
       alternative = "two.sided",
       mu = 0,
       conf.level = 0.95)

    Paired t-test

data:  Pair(cps19_lr_parties_1, cps19_lr_parties_2)
t = -95.406, df = 26718, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.691519 -2.583155
sample estimates:
mean difference 
      -2.637337 
formula = Pair(cps19_lr_parties_1, cps19_lr_parties_2) ~ 1

We have two paired interval-ratio variables and therefore use a formula in the form Pair(<variable name 1>, <variable name 2>) ~ 1

data = canada

We specify the data frame that we want to use.

alternative = "two.sided"

Determines whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when \(H_1: \mu < p\)) or “greater” (when \(H_1: \mu > p\)).

mu = 0

The mu parameter should be set to the value of the mean under the null hypothesis. In the case of a paired sample t-test we usually hypothesize that the difference between the two means is 0 in the population, thus mu = 0.

conf.level = 0.95

This specifies the confidence level of the confidence interval of the difference. The default is 0.95 (a 95% confidence interval).

14.2.4 Reporting a paired samples t-test

In a paired samples t-test we compare the mean value of two interval-ratio variables. In the below example we test the null hypothesis that the mean difference of the left-right placement of the Liberal party (variable cps_lr_parties_1) and the Conservative party (variable cps19_lr_parties_2) is equal to 0 in the population.

library(dplyr)
# Calculate the standard error of the mean for both variables, excluding missing observations
MeanSE(canada$cps19_lr_parties_1, na.rm = TRUE)
[1] 0.01674762
MeanSE(canada$cps19_lr_parties_2, na.rm = TRUE)
[1] 0.01660731
t.test(x = canada$cps19_lr_parties_1, 
           y = canada$cps19_lr_parties_2,
           data = canada,
           alternative = "two.sided",
           mu = 0,
           paired = TRUE,
           conf.level = 0.95)

    Paired t-test

data:  canada$cps19_lr_parties_1 and canada$cps19_lr_parties_2
t = -95.406, df = 26718, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.691519 -2.583155
sample estimates:
mean difference 
      -2.637337 
Output explanation
  • In R, the output shows: +the mean difference of the observations in the sample (see ‘mean difference’). +the calculated t-value (see ‘t =’). +the degrees of freedom (see ‘df =’). +the p-value (i.e. the probability of finding the obtained t-value, given the null hypothesis), see p-value < 2.2e-16. This is a very small number which is why R displays it like this. +the expected value under the null hypothesis (here true mean difference is not equal to 0). +the standard errors of both variables in the sample are obtained separately via MeanSE().

14.2.4.1 Reporting

The correct report includes:

  • A conclusion about the null hypothesis;
  • the mean (in the text or as M = …) and the standard error (SE = …) of each variable.
  • The difference in the means (here -2.637337),
  • the degrees of freedom
  • p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
    • in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.

If you calculate the results by hand you write p < “the chosen α-value”, for example, p < 0.05.

Report

✓ The left-right placement of the Liberal party (M = 4.25; SE = 0.017) was lower than the left-right placement of the Conservative party (M = 6.88; SE = 0.017). This difference, -2.64, was statistically significant, t(26718) = -95.406, p < 0.001.

14.2.5 Independent-samples t-test

The independent samples t-test is used to compare whether the means of two groups are statistically significantly different. We thus have a numeric variable, for which we can calculate a mean, and a grouping variable, which is a categorical variable that determines the group membership. In our example we test whether there is a statistically significant difference in the mean placement of the Liberal party by those who are born in Canada and those who are not.

First we inspect the grouping variable:

table(canada$cps19_bornin_canada)

                          Yes                            No 
                        31556                          6046 
Don't know/ Prefer not to say 
                          220 

It turns out this variable has three categories. We would like to ignore the Don’t know category, so we treat this as missing:

canada <- canada |>
  mutate(cps19_bornin_canada = na_if(cps19_bornin_canada, "Don't know/ Prefer not to say")) 

Now we can run the t-test:

t.test(formula = cps19_lr_parties_1 ~ cps19_bornin_canada,
       data = canada,
       alternative = "two.sided", 
       mu = 0,
       conf.level = .95)
formula = cps19_lr_parties_1 ~ cps19_bornin_canada

We have one interval-ratio variable and one categorical variable (factor) that captures to which group an observation belongs, thus we use a formula of the form <interval-ratio variable> ~ <categorical variable>.

data = canada

We specify the data frame that we want to use.

alternative = "two.sided"

Determines whether we want to use a two sided-test, or a one-sided test. Options are “two.sided” (default), “less” (when \(H_1: \mu < p\)) or “greater” (when \(H_1: \mu > p\)).

mu = 0

The mu parameter should be set to the value of the mean under the null hypothesis. In the case of an independent samples t-test we usually hypothesize that the difference between the two group means is 0 in the population, thus mu = 0.

conf.level = 0.95

This specifies the confidence level of the confidence interval of the difference. The default is 0.95 (a 95% confidence interval).


    Welch Two Sample t-test

data:  cps19_lr_parties_1 by cps19_bornin_canada
t = -8.9025, df = 6809.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Yes and group No is not equal to 0
95 percent confidence interval:
 -0.4886007 -0.3122543
sample estimates:
mean in group Yes  mean in group No 
         4.196678          4.597106 

R displays the result for the Welch Two Sample t-test, which is a version of the independent samples t-test that applies when equal variances are not assumed.

14.2.6 Reporting an independent samples t-test

The independent samples t-test is used to compare whether the means of two groups are statistically significantly different. We use data on the number of gym attendances last month of male and female respondents.

Example output R (do not include the output directly in an academic paper):

library(dplyr)

# Calculate the standard error of the mean per group, dropping missing values. 
#These values (mean and standard error) are needed for the report.

gym %>%                               # Summary by group using dplyr
  group_by(gender) |>  
  summarize(mean = mean(gym_attendance, na.rm = TRUE),
            se = MeanSE(gym_attendance, na.rm = TRUE))
# A tibble: 2 × 3
  gender  mean     se
  <chr>  <dbl>  <dbl>
1 Female  3.00 0.0542
2 Male    3.97 0.0632
t.test(formula = gym_attendance ~ gender,
       data = gym,
       alternative = "two.sided", 
       mu = 0,
       conf.level = .95)

    Welch Two Sample t-test

data:  gym_attendance by gender
t = -11.674, df = 1944.4, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -1.1355660 -0.8089068
sample estimates:
mean in group Female   mean in group Male 
            3.001031             3.973267 
Output explanation
  • In R, the output shows: +the means of the variables (see ‘sample estimates’). +the calculated t-value (see ‘t =’). +the degrees of freedom (see ‘df =’). +the p-value (i.e. the probability of finding the obtained t-value, given the null hypothesis). +the expected value under the null hypothesis (here true mean difference is not equal to 0). +the mean of both groups is obtained separately via mean().
    +the standard errors of both groups in the sample are obtained separately via MeanSE().

14.2.6.1 Reporting

The correct report includes:

  • A conclusion about the null hypothesis;
  • the mean (in the text or as M = …) and the standard error (SE = …) of each variable,
  • The difference in means (here 0.972),
  • the degrees of freedom,
  • p = p-value. When working with statistical software, you should report the exact p-value that is displayed.
    • in R, small values may be displayed using the scientific notation (e.g. 2.2e-16 is the scientific notation of 0.00000000000000022.) This means that the value is very close to zero. R uses this notation automatically for very small numbers. In these cases, write p < 0.001 in your report.
Report

✓ The number of gym attendances of men last month (M = 3.97; SE = 0.063) was higher than the number of gym attendances of women last month (M = 3.00; SE = 0.054). This difference, 0.972, was statistically significant, t(1944.363) = 11.674, p < 0.001.

14.3 Effect sizes for t-tests

We can calculate Cohen’s \(d\) or Hedges’ \(g^*_s\) as an effect size measure for a t-test. We are using function cohens_d and hedges_g from package effectsize.

14.3.1 Cohen’s \(d\) for one sample

We are using function cohens_d from package effectsize. The parameters are pretty much the same as for the function t.test:

library(effectsize)
cohens_d(cps19_age ~ 1, 
         data = canada, 
         mu = 48.5)
Cohen's d |       95% CI
------------------------
0.01      | [0.00, 0.02]

- Deviation from a difference of 48.5.
cps19_age ~ 1

This is the formula used for a single sample mean. In this case specify the formula in the form <variable name> ~ 1.

data = canada

We specify the data frame that we are using. Replace with your own data frame name.

mu = 48.5

We specify the value of the population mean under the null hypothesis. It is very important to specify this. If omitted, mu is set to 0 and this will often not be appropriate, especially for a one sample test.

14.3.2 Cohen’s d for paired samples

We are using function cohens_d from package effectsize. The parameters are pretty much the same as for the function t.test:

library(effectsize)
cohens_d(Pair(cps19_lr_parties_1, cps19_lr_parties_2) ~ 1, 
         data = canada, 
         mu = 0)
Cohen's d |         95% CI
--------------------------
-0.58     | [-0.60, -0.57]
Pair(cps19_lr_parties_1, cps19_lr_parties_2) ~ 1

We have two paired interval-ratio variables and therefore use a formula in the form Pair(<variable name 1>, <variable name 2>) ~ 1.

data = canada

We specify the data frame that we want to use.

mu = 0

The mu parameter should be set to the value of the mean under the null hypothesis. In the case of a paired sample t-test we usually hypothesize that the difference between the two variables is 0 in the population, thus mu = 0.

14.3.3 Hedges’ \(g^*_s\) for independent samples

For two independent samples, we recommend calculating Hedges \(g^*_s\) as effect size measure (Delacre et al. 2021). Its interpretation is similar to Cohen’s \(d\), but bias-corrected and adapted to a situation when equal variances cannot be assumed. The values are very similar to Cohen’s \(d\) for larger samples.

Note that for our example we use the modified version of variable cps19_bornin_canada (see Independent-samples t-test):

library(effectsize)
hedges_g(cps19_lr_parties_1 ~ cps19_bornin_canada, 
         data = canada, 
         mu = 0,
         pooled_sd = FALSE)
Hedges' g |         95% CI
--------------------------
-0.14     | [-0.17, -0.11]

- Estimated using un-pooled SD.
cps19_lr_parties_1 ~ cps19_bornin_canada

We have one interval-ratio variable and one categorical variable (factor) that captures to which group an observation belongs, thus we use a formula of the form <interval-ratio variable> ~ <categorical variable>.

data = canada

We specify the data frame that we want to use.

mu = 0

The mu parameter should be set to the value of the mean under the null hypothesis. In the case of an independent samples t-test we usually hypothesize that the difference between the two group means is 0 in the population, thus mu = 0.

pooled_sd = FALSE

This specifies that we do not use the pooled standard deviation, which is recommended when using Welch’ t-test (not assuming variances are equal), which t.test does by default for the independent samples t-test.

14.3.4 Reporting measures of association for t-tests

14.3.4.1 Cohen’s d

Cohen’s d, or standardized mean difference, is one of the most common ways to measure effect size for t-tests. There are different ways to calculate it, depending on the type of t-test but the reporting is the same. Here, I show Cohen’s d based on the one-sample t-test above.

Example output R (do not include the output directly in an academic paper):

MeanSE(fruit$fruitconsumption, na.rm = TRUE)
[1] 1.714986
t.test(x = fruit$fruitconsumption,
           alternative = "two.sided",
           mu = 100,
           conf.level = 0.95)

    One Sample t-test

data:  fruit$fruitconsumption
t = -5.831, df = 33, p-value = 1.587e-06
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 86.51084 93.48916
sample estimates:
mean of x 
       90 
library(effectsize)
cohens_d(fruitconsumption ~ 1, 
         data = fruit, 
         mu = 100)
Cohen's d |         95% CI
--------------------------
-1.00     | [-1.41, -0.58]

- Deviation from a difference of 100.
Interpretation

The Cohen’s d value range is from -∞ to +∞ and values can therefore be negative. If d is negative, it simply means that the group mean of the sample (or in the case of the paired/independent sample t-test the first group) is lower than the group of mean of population (or in the case of the paired/independent sample t-test the second group).

A Cohen’s d of 1 means that the difference between the sample mean and hypothesized population mean is 1 standard deviation. A Cohen’s d of 0.5 means that the difference between the sample mean and hypothesized mean is 0.5 standard deviations (half a standard deviation).

The following rule of thumbs are often applied for interpreting Cohen’s d:

  • A value of at least 0.2 represents a small effect size.

  • A value of at least 0.5 represents a medium effect size.

  • A value of at least 0.8 represents a large effect size.

As always with rules of thumbs, be careful and consider the type of data that you are working with.

14.3.4.2 Reporting

If you have calculated Cohen’s d for a t-test, you can add it to the report:

Report

✓ The mean fruit consumption of students was 90 (SE = 1.71). This is not significantly different from the mean fruit consumption of the whole population (t(33) = -1.167, p = 0.252). This represents a large difference, d = -1.00.

14.3.4.3 Hedges’ \(g^*_s\) for independent samples

Cohen’s d and Hedges \(g^*_s\) are largely comparable, but Hedges modification is considered more robust for small samples.

Example output R (do not include the output directly in an academic paper):

MeanSE(fruit$fruitconsumption, na.rm = TRUE)
[1] 1.714986
t.test(x = fruit$fruitconsumption,
           alternative = "two.sided",
           mu = 100,
           conf.level = 0.95)

    One Sample t-test

data:  fruit$fruitconsumption
t = -5.831, df = 33, p-value = 1.587e-06
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 86.51084 93.48916
sample estimates:
mean of x 
       90 
library(effectsize)
hedges_g(fruitconsumption ~ 1, 
         data = fruit, 
         mu = 100)
Hedges' g |         95% CI
--------------------------
-0.98     | [-1.38, -0.57]

- Deviation from a difference of 100.
Interpretation

The interpretation of Hedges’ \(g_s^*\) is similar to Cohen’s D. A Hedges’ \(g_s^*\) of 1 means that the difference between the two sample means is 1 standard deviation. A Hedges’ \(g_s^*\) of 0.5 means that the difference between the two sample means is 0.5 standard deviations (half a standard deviation).

The following rule of thumbs are often applied for interpreting Cohen’s d and Hedges’ \(g_s^*\):

  • A value of at least 0.2 represents a small effect size.

  • A value of at least 0.5 represents a medium effect size.

  • A value of at least 0.8 represents a large effect size.

As always with rules of thumbs, be careful and consider the type of data that you are working with.

14.3.4.4 Reporting

The Hedges’ \(g_s^*\) effect size is often included after the independent samples t-test, for example:

Report

✓ The mean fruit consumption of students was 90 (SE = 1.71). This is not significantly different from the mean fruit consumption of the whole population (t(33) = -1.167, p = 0.252). This represents a large difference, \(g_s^*\) = -0.98.


  1. In this overview we use the co-called ‘formula interface’ to t.test. You can also use the so-called traditional interface. For the one-sample t-test example, the equivalent code is:

    t.test(x = canada$cps19_age,
           alternative = "two.sided",
           mu = 48.5,
           conf.level = 0.95)
    
        One Sample t-test
    
    data:  canada$cps19_age
    t = 2.2395, df = 37821, p-value = 0.02513
    alternative hypothesis: true mean is not equal to 48.5
    95 percent confidence interval:
     48.52387 48.85871
    sample estimates:
    mean of x 
     48.69129 
    ↩︎
  2. In this overview we use the co-called ‘formula interface’ to t.test. You can also use the so-called traditional interface. For the paired samples t-test example, the equivalent code is:

    t.test(x = canada$cps19_lr_parties_1, 
           y = canada$cps19_lr_parties_2,
           data = canada,
           alternative = "two.sided",
           mu = 0,
           paired = TRUE,
           conf.level = 0.95)
    
        Paired t-test
    
    data:  canada$cps19_lr_parties_1 and canada$cps19_lr_parties_2
    t = -95.406, df = 26718, p-value < 2.2e-16
    alternative hypothesis: true mean difference is not equal to 0
    95 percent confidence interval:
     -2.691519 -2.583155
    sample estimates:
    mean difference 
          -2.637337 

    In this case, do not forget to include paired = TRUE.↩︎