3  Statistical Significance

Regression coefficients tell us something about the expected mean level of our dependent variable: what we expect it to be when the independent variable(s) equal 0 (the Intercept term) and how we expect it to change when the independent variable changes by 1 (independent variable coefficient). However, we should also discuss the uncertainty surrounding these estimates: what other coefficient values are plausible given our data? That is the topic for this chapter.

Here are the packages and data that we will use alongside some preliminary data cleaning.

#Packages
library(broom)        #Additional tools for model summaries
library(rio)          #loading data
library(tidyverse)    #data manipulation and plotting

#Import Data using rio::import()
demdata <- import("demdata.rds") |> 
  as_tibble()

#Some data cleaning (see last chapter)
demdata <- demdata |>  
  mutate(TYPEDEMO1984_factor = factorize(TYPEDEMO1984), 
         Typeregime2006_factor = factorize(Typeregime2006))
1
You do not always need as_tibble() as shown here. We do this here because it is easier to show larger datasets with them. See Statistics I, 2.1.

3.1 t- and p-values via summary()

Most of the relevant information for discussing statistical significance and uncertainty is produced by the summary() command.

#Store the model to an object of your naming
model_binary <- lm(v2x_polyarchy ~ TYPEDEMO1984_factor, data=demdata)

#Use summary() to inspect the object
summary(model_binary)

Call:
lm(formula = v2x_polyarchy ~ TYPEDEMO1984_factor, data = demdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.51025 -0.15007 -0.00857  0.17309  0.48543 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     0.41757    0.02333   17.90  < 2e-16 ***
TYPEDEMO1984_factorDemocracies  0.27268    0.03695    7.38 1.25e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2163 on 141 degrees of freedom
  (36 observations deleted due to missingness)
Multiple R-squared:  0.2786,    Adjusted R-squared:  0.2735 
F-statistic: 54.47 on 1 and 141 DF,  p-value: 1.247e-11
Output Explanation

Information concerning uncertainty in our estimates, and statistical significance, is provided in the Coefficients area via these columns:

  • Std. Error: The standard error of the coefficient
  • t value: The t-statistic or t-value for the coefficient (\(t = \frac{\textrm{Coefficient}}{\textrm{Std.Error}}\))
  • Pr(>|t|): The p-value for the t-statistic - the probability of observing a t-value of that size or larger assuming that the null hypothesis of no effect is true and all model assumptions are correct
  • Asterisks and Signif. codes: You may see symbols next to the value under Pr(>|t|). These tell you whether the coefficient is statistically significant and at what level. The “Signif. codes” row provides you with the information needed to interpret these symbols. A single asterisk (*), for instance, means that the p-value is smaller than 0.05 but larger than 0.01 while two asterisks (**) tell you that the p-value is smaller than 0.01 but larger than 0.001.

We typically assess the statistical significance of a coefficient by looking at whether there are any symbols next to the value in the Pr(>|t|) column. See Section Section 8.4 for information on how to include this information in your reports.

3.2 Confidence Intervals via tidy()

One thing not shown in the output produced by summary() is the 95% confidence interval for the coefficient estimates. We can obtain these values alongside the coefficients from our model by using the tidy() function from the broom package. This package needs to be loaded prior to use (this is done at the start of this chapter).

tidy(model_binary, conf.int = TRUE)
# A tibble: 2 × 7
  term                  estimate std.error statistic  p.value conf.low conf.high
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)              0.418    0.0233     17.9  4.02e-38    0.371     0.464
2 TYPEDEMO1984_factorD…    0.273    0.0369      7.38 1.25e-11    0.200     0.346
tidy(

The name of the command.

model_binary,

The name of the model we want to work with.

conf.int = TRUE)

This controls whether the command reports the confidence interval or not. The default behavior of the command is to not show the confidence interval. We must explicitly tell the command that we want this via this option. We can also write conf.int = T and achieve the same end (“T” acting as shorthand for “TRUE”).

Output Explanation

The tidy() function will produce a dataframe with the following columns:

  • term: The names of the “terms” in the model (e.g., the Intercept and independent variables).
  • estimate: This provides the coefficients for each term in the model
  • std.error: This provides the standard error for the coefficients
  • statistic: This provides the t-value
  • p.value: This provides the p-value
  • conf.low & conf.high: These provides the lower and upper bounds of the confidence interval respectively

We can change the level of the confidence interval displayed by tidy(). For instance, we can obtain the 99% confidence interval by adding conf.level = 0.99 to our command.

tidy(model_binary, conf.int = T, conf.level = 0.99)
# A tibble: 2 × 7
  term                  estimate std.error statistic  p.value conf.low conf.high
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)              0.418    0.0233     17.9  4.02e-38    0.357     0.478
2 TYPEDEMO1984_factorD…    0.273    0.0369      7.38 1.25e-11    0.176     0.369

Both summary() and tidy() show us the coefficients from our model. One advantage of tidy() is that its output is a “tidy” dataframe which can be manipulated in the same ways that we manipulate dataframes more generally (e.g., renaming columns, recoding variables, and so on). We will use this aspect of tidy() in a subsequent chapter to produce graphical displays of our regression results.