To this point, we have made claims about the expected value and variance of the OLS estimator, \(\boldsymbol{\hat{\beta}}\)
To make probabilistic claims about \(\boldsymbol{\beta}\) based on our sample, we need to more fully characterize the distribution of \(\boldsymbol{\hat{\beta}}\)
Sampling distributions
Just like the mean estimator for a single variable has a sampling distribution, so will our OLS estimator, \(\hat{\beta}\).
If the right-hand side is a linear function, \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\), of a normally-distributed random variable, \(\boldsymbol{u}\), the left-hand side (which subtracts a constant, \(\boldsymbol{\beta}\)) is also normally distributed.
Sampling distribution for \(\boldsymbol{\hat{\beta}}\)
The estimated variance of our estimator is a function of our independent variables and the estimated error variance
What does it look like?
Standard errors
Standard errors for the \(\hat{\beta}_k\) are the square roots of the main diagonal of the covariance matrix, \(\hat{\sigma}^2(\mathbf{X}'\mathbf{X})^{-1}\)
This is the standard deviation of the sampling distributions for \(\hat{\beta_k}\)
Call:
lm(formula = Volume ~ Height, data = trees)
Residuals:
Min 1Q Median 3Q Max
-21.274 -9.894 -2.894 12.068 29.852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -87.1236 29.2731 -2.976 0.005835 **
Height 1.5433 0.3839 4.021 0.000378 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 13.4 on 29 degrees of freedom
Multiple R-squared: 0.3579, Adjusted R-squared: 0.3358
F-statistic: 16.16 on 1 and 29 DF, p-value: 0.0003784
# using lmm_trees <-lm(Volume ~ Height + Girth, data = trees)m_trees_sum <-summary(m_trees)# print coefs and SEs from summary objectround(m_trees_sum$coefficients[,c(1,2)], 4)
Given that we know the CDF of the \(t_{N-K-1}\) distribution, we can evaluate how extreme our estimated \(\hat{\beta}_k\) is under the null hypothesis that \(\beta_k = 0\).
Null hypothesis tests
Once we know \(\hat{\beta}_k\) and its standard error, we can divide the former by the latter to get a:
t-statistic: “How many standard errors away from zero is \(\hat{\beta}_k\)?”
p-value: “How frequently would our estimate be this many standard errors from zero, under repeated sampling, if \(\beta_k = 0\)?”
By default, political scientists tend to use two-tailed tests
“How frequently would our estimate be this extreme, in either direction”?
One-tailed: …this extreme in a particular direction
Null hypothesis tests
# continuing the above code# the t-statistic standardizes the coefficienttstats <- beta / se# the p value characterizes how extreme the t statistic is under the nullpvals <- ( 1-pt(abs(tstats), df =nrow(X) -ncol(X)) ) *2reg_table <-data.frame(estimate = beta, stderr = se,tstat = tstats, pval = pvals)round(reg_table, 4)
vote age economic.cond.national
Conservative :462 Min. :24.00 Min. :1.000
Labour :720 1st Qu.:41.00 1st Qu.:3.000
Liberal Democrat:343 Median :53.00 Median :3.000
Mean :54.18 Mean :3.246
3rd Qu.:67.00 3rd Qu.:4.000
Max. :93.00 Max. :5.000
economic.cond.household Blair Hague Kennedy
Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :3.00 Median :4.000 Median :2.000 Median :3.000
Mean :3.14 Mean :3.334 Mean :2.747 Mean :3.135
3rd Qu.:4.00 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.00 Max. :5.000 Max. :5.000 Max. :5.000
Europe political.knowledge gender
Min. : 1.000 Min. :0.000 female:812
1st Qu.: 4.000 1st Qu.:0.000 male :713
Median : 6.000 Median :2.000
Mean : 6.729 Mean :1.542
3rd Qu.:10.000 3rd Qu.:2.000
Max. :11.000 Max. :3.000
Example
Call:
lm(formula = Europe ~ age + gender + economic.cond.national +
economic.cond.household + political.knowledge, data = BEPS)
Residuals:
Min 1Q Median 3Q Max
-8.8776 -2.4950 -0.1701 2.8604 6.3603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.759370 0.482226 20.238 < 2e-16 ***
age 0.013100 0.005189 2.525 0.0117 *
gendermale -0.266411 0.165308 -1.612 0.1073
economic.cond.national -0.729588 0.098555 -7.403 2.20e-13 ***
economic.cond.household -0.174010 0.093388 -1.863 0.0626 .
political.knowledge -0.454822 0.076176 -5.971 2.94e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.174 on 1519 degrees of freedom
Multiple R-squared: 0.07665, Adjusted R-squared: 0.07361
F-statistic: 25.22 on 5 and 1519 DF, p-value: < 2.2e-16
Testing against another value
Sometimes we may wish to test a null hypothesis that \(\beta_k = b\), where \(b\) is some specific value not equal to 0 - this is easy!
just subtract out the value of interest (\(b\)), and you are back to a standard null hypothesis test
# test whether polknow coef is diff from -0.5B <-coef(m_beps)["political.knowledge"]se <-sqrt(diag(vcov(m_beps)))["political.knowledge"]df <- m_beps$df.residualtest_value <--0.5# the t-statistic standardizes the coefficienttstat <- (B - test_value) / se# the p value characterizes how extreme the t statistic is under the nullpval <- (1-pt(abs(tstat), df = df))*2table <-data.frame(estimate = B, stderr = se,tstat = tstat, pval = pval)round(table, 4)
Thinking back to last semester: what’s the frequentist interpretation of a confidence interval?
Confidence interval is a statement about the estimator
Under repeated sampling, this interval will contain the population parameter X% of the time
(When assumptions underpinning the estimator are met!)
Lower and upper bounds
For an X% confidence interval:
The upper bound is the value of \(\beta_k\) such that estimates greater than \(\hat{\beta}_k\) would occur \(\frac{100-X}{2}\)% of the time
The lower bound is the value of \(\beta_k\) such that estimates less than \(\hat{\beta}_k\) would occur \(\frac{100 - X}{2}\)% of the time
These are just the respective quantiles of the relevant t distribution centered on \(\hat{\beta}_k\), multiplied by the SE
Calculating 95% CIs
# from our previous exampleB <-coef(m_beps)df <- m_beps$df.residualse <-sqrt(diag(vcov(m_beps)))# calculate 95% CI for age coefB["age"] +qt(c(0.025, 0.975), df) * se["age"]
Is the coefficient for national economic conditions significantly different from the coefficient for household economic conditions?
Call:
lm(formula = Europe ~ age + gender + economic.cond.national +
economic.cond.household + political.knowledge, data = BEPS)
Residuals:
Min 1Q Median 3Q Max
-8.8776 -2.4950 -0.1701 2.8604 6.3603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.759370 0.482226 20.238 < 2e-16 ***
age 0.013100 0.005189 2.525 0.0117 *
gendermale -0.266411 0.165308 -1.612 0.1073
economic.cond.national -0.729588 0.098555 -7.403 2.20e-13 ***
economic.cond.household -0.174010 0.093388 -1.863 0.0626 .
political.knowledge -0.454822 0.076176 -5.971 2.94e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.174 on 1519 degrees of freedom
Multiple R-squared: 0.07665, Adjusted R-squared: 0.07361
F-statistic: 25.22 on 5 and 1519 DF, p-value: < 2.2e-16
Example
# get relevant variances and covariances from var-cov matrix of coefficientsvar_N <-vcov(m_beps)["economic.cond.national","economic.cond.national"]var_H <-vcov(m_beps)["economic.cond.household","economic.cond.household"]cov_NH <-vcov(m_beps)["economic.cond.national","economic.cond.household"]# var and standard error of linear combinationvar_NH <- var_N + var_H -2*cov_NHse_NH <-sqrt(var_NH)# calculate test stat( coef(m_beps)["economic.cond.national"] -coef(m_beps)["economic.cond.household"] ) / se_NH
economic.cond.national
-3.528471
Short-cut for this simple case
m_beps_NH <-lm(Europe ~ age + gender + economic.cond.national +I(economic.cond.national + economic.cond.household) + political.knowledge, data = BEPS)summary(m_beps_NH)
Call:
lm(formula = Europe ~ age + gender + economic.cond.national +
I(economic.cond.national + economic.cond.household) + political.knowledge,
data = BEPS)
Residuals:
Min 1Q Median 3Q Max
-8.8776 -2.4950 -0.1701 2.8604 6.3603
Coefficients:
Estimate Std. Error
(Intercept) 9.759370 0.482226
age 0.013100 0.005189
gendermale -0.266411 0.165308
economic.cond.national -0.555579 0.157456
I(economic.cond.national + economic.cond.household) -0.174010 0.093388
political.knowledge -0.454822 0.076176
t value Pr(>|t|)
(Intercept) 20.238 < 2e-16 ***
age 2.525 0.01169 *
gendermale -1.612 0.10726
economic.cond.national -3.528 0.00043 ***
I(economic.cond.national + economic.cond.household) -1.863 0.06261 .
political.knowledge -5.971 2.94e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.174 on 1519 degrees of freedom
Multiple R-squared: 0.07665, Adjusted R-squared: 0.07361
F-statistic: 25.22 on 5 and 1519 DF, p-value: < 2.2e-16
Joint hypotheses
A joint null claims that some set of parameters are all zero (or some test value)
for example:
\[
\beta_1 = \beta_2, ..., = \beta_K = 0
\]
Rejecting the null tells you only about the significance of the set, not the individual parameters
Joint hypotheses
Why can’t we just take sets of parameters out and check whether/how fit changes?
Removing parameters always decreases model fit in-sample
Question is whether model fit decreases more than would be expected simply by adding degrees of freedom
F statistic
An F stat compares residual SS of restricted and unrestricted models
\(F \sim F(q, N-K-1)\), where \(q\) is # restrictions
the F distribution is formed from ratios of \(\chi^2\) distributions divided by degrees freedom
Because \(\chi^2\) distributions are always positive, F distributions will be too
Example
Compare intercept-only model to model with all predictors (both models must be estimated w/ same data! Beware list-wise deletion!)
# estimate restricted and unrestricted modelsm1_r <-lm(Europe ~1, data = m_beps$model)m1_ur <- m_beps# summarysummary(m1_ur)
Call:
lm(formula = Europe ~ age + gender + economic.cond.national +
economic.cond.household + political.knowledge, data = BEPS)
Residuals:
Min 1Q Median 3Q Max
-8.8776 -2.4950 -0.1701 2.8604 6.3603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.759370 0.482226 20.238 < 2e-16 ***
age 0.013100 0.005189 2.525 0.0117 *
gendermale -0.266411 0.165308 -1.612 0.1073
economic.cond.national -0.729588 0.098555 -7.403 2.20e-13 ***
economic.cond.household -0.174010 0.093388 -1.863 0.0626 .
political.knowledge -0.454822 0.076176 -5.971 2.94e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.174 on 1519 degrees of freedom
Multiple R-squared: 0.07665, Adjusted R-squared: 0.07361
F-statistic: 25.22 on 5 and 1519 DF, p-value: < 2.2e-16
F test by hand
# difference in sum of squared residualsSS_diff <-sum(m1_r$residuals^2) -sum(m1_ur$residuals^2)# difference in degrees of freedomdf_diff <- m1_r$df.residual - m1_ur$df.residual# SS resid for unrestricted modelSS_ur <-sum(m1_ur$residuals^2)# degrees of freedom (unrestricted)df_ur <- m1_ur$df.residual # calculate F statFstat <- (SS_diff/df_diff) / (SS_ur/df_ur)# p-valuecbind(F = Fstat, p =1-pf(Fstat, df_diff, df_ur))
F p
[1,] 25.21934 0
F test using anova
stats::anova() will do this for you
anova(m1_r, m1_ur)
Analysis of Variance Table
Model 1: Europe ~ 1
Model 2: Europe ~ age + gender + economic.cond.national + economic.cond.household +
political.knowledge
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1524 16572
2 1519 15301 5 1270.2 25.219 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This lets us say that all predictors are jointly significant – still need to interpret unrestricted model to make claims about any predictor on its own
lm default F test
Call:
lm(formula = Europe ~ age + gender + economic.cond.national +
economic.cond.household + political.knowledge, data = BEPS)
Residuals:
Min 1Q Median 3Q Max
-8.8776 -2.4950 -0.1701 2.8604 6.3603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.759370 0.482226 20.238 < 2e-16 ***
age 0.013100 0.005189 2.525 0.0117 *
gendermale -0.266411 0.165308 -1.612 0.1073
economic.cond.national -0.729588 0.098555 -7.403 2.20e-13 ***
economic.cond.household -0.174010 0.093388 -1.863 0.0626 .
political.knowledge -0.454822 0.076176 -5.971 2.94e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.174 on 1519 degrees of freedom
Multiple R-squared: 0.07665, Adjusted R-squared: 0.07361
F-statistic: 25.22 on 5 and 1519 DF, p-value: < 2.2e-16
For subsets of predictors
anova( update(m_beps, . ~ . - age - gender, data = m_beps$model), m_beps)
Analysis of Variance Table
Model 1: Europe ~ economic.cond.national + economic.cond.household + political.knowledge
Model 2: Europe ~ age + gender + economic.cond.national + economic.cond.household +
political.knowledge
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1521 15393
2 1519 15301 2 91.376 4.5356 0.01087 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F test is significant, but coefficient for gender is not!
Again, this is a joint test!
car package’s linearHypothesis
The car package has a convenient function for testing linear combinations of OLS coefficients
syntax puts an lm model object in first slot, and a linear combination in the second slot (assumes 0 if no explicit equality provided)
linearHypothesis(m_beps, "age = 0")
Linear hypothesis test:
age = 0
Model 1: restricted model
Model 2: Europe ~ age + gender + economic.cond.national + economic.cond.household +
political.knowledge
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1520 15366
2 1519 15301 1 64.2 6.3732 0.01169 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear hypothesis test:
economic.cond.national - 2 economic.cond.household = 0
Model 1: restricted model
Model 2: Europe ~ age + gender + economic.cond.national + economic.cond.household +
political.knowledge
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1520 15327
2 1519 15301 1 25.589 2.5403 0.1112
Reminder! Need same observations
These tests require using the same observations for both restricted and unrestricted models
Invalid otherwise
Matters for data with missing values (listwise deletion is the default with most software)
When missing data is possible, and you add variables (as you do in these kinds of tests), there is a decent chance you will lose data
Thoughts on null hypothesis testing (OPINION)
Null hypothesis testing is the norm, but it shouldn’t be
Do we ever actually believe \(\beta_k = 0\)?
If not, what are we actually testing? (sample size?)
NHT also shifts focus away from the estimates themselves and the information we have about them
e.g., should our interpretation drastically change when \(p\) changes from 0.051 to 0.049?
A better approach (OPINION)
It is better, IMHO, to focus on:
Interpretation of the substantive meaning of the coefficient estimates
Our uncertainty in those estimates, typically characterized by a confidence interval (or two)
Even insignificant coefficients provide information! Just be precise about how much