Advanced Strategies for Inference

POLSCI 630: Probability and Basic Regression

February 25, 2025

Inference w/ non-linearity

\[y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2 + u\]

Calculating measures of uncertainty is tougher with non-linear relationships b/c can’t rely on R canned output

e.g. what is standard error for marginal effect of \(x_1\) when \(x_2 = 1\)?

There are 4 basic strategies we will discuss

Analytic solutions
Simulation-based approximations
Bootstrapping-based approximations
Delta method approximations

Analytic: variance of linear combination

\[ \text{Var} (aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\text{Cov}(X,Y) \]

Standard errors for interactions (see also here)

\[y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \beta_3x_{1i}x_{2i} + u_i\]

Marginal effect of \(x_1\): \(\frac{\partial y_i}{\partial x_{1i}} = \beta_1 + \beta_3x_{2i}\)

The standard error for the conditional marginal effect of \(x_{ki}\), given its interaction with \(x_{li}\) is:

\[\sigma_{\frac{\partial y_i}{\partial x_{ki}}} = \sqrt{\text{Var}(\hat{\beta}_k) + x_{li}^2 \text{Var}(\hat{\beta}_{k,l}) + 2x_{li} \text{Cov}(\hat{\beta}_k, \hat{\beta}_{k,l})}\]

where \(\hat{\beta}_{k,l}\) represents the interaction term coefficient

Standard errors for quadratic models

Since a quadratic term is a special case of an interaction, this formula holds for those cases as well

One difference is the “2” multiplier that tags along because of the first derivative of the quadratic term:

\[\sigma_{\frac{\partial y_i}{\partial x_{i}}} = \sqrt{\text{Var}(\hat{\beta}_x) + (2x_{i})^2 \text{Var}(\hat{\beta}_{x^2}) + 4x_{i} \text{Cov}(\hat{\beta}_x, \hat{\beta}_{x^2})}\]

Example

\[y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \beta_3x_{2i}^2 + u_i\]

Marginal effect of \(x_{2i}\):

\[ \frac{\partial y_i}{\partial x_{2i}} = \beta_2 + 2\beta_3x_{2i} \]

Variance of this random variable:

\[ \begin{align} &\text{Var} (1\beta_2 + 2x_{2i}\beta_3) = \\ &1^2\text{Var}(\beta_2) + (2x_{2i})^2\text{Var}(\beta_3) + (2)(1)(2x_{2i})\text{Cov}(\beta_2,\beta_3) \end{align} \]

Interaction example


Call:
lm(formula = econ_mean ~ age_scale + I(age_scale^2) + male + 
    educ_scale + income_scale * know_mean, data = lucid)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.46091 -0.13112 -0.00257  0.11944  0.69693 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)             0.3157633  0.0135866  23.241  < 2e-16 ***
age_scale              -0.1561355  0.0495193  -3.153  0.00163 ** 
I(age_scale^2)          0.1232286  0.0566972   2.173  0.02980 *  
male                    0.0348407  0.0056419   6.175 7.16e-10 ***
educ_scale              0.0257784  0.0111371   2.315  0.02068 *  
income_scale            0.1561915  0.0227687   6.860 7.79e-12 ***
know_mean               0.0007747  0.0165371   0.047  0.96264    
income_scale:know_mean -0.0557548  0.0330965  -1.685  0.09213 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1881 on 4682 degrees of freedom
  (58 observations deleted due to missingness)
Multiple R-squared:  0.05968,   Adjusted R-squared:  0.05827 
F-statistic: 42.45 on 7 and 4682 DF,  p-value: < 2.2e-16

Interaction example

Marginal effect of income_scale:

\[ \frac{\partial y_i}{\partial \text{income_scale_i}} = \beta_5 + \beta_7 \text{know_mean_i} \]

# calculate conditional standard error for marginal effect of income
se_income <- sqrt(   
  
  # (1)^2 * (Var(B_income))
  1^2 * vcov(m5)["income_scale", "income_scale"] +
    
  # (know_mean)^2 * (B_interaction)
  seq(0, 1, 0.05)^2 * vcov(m5)["income_scale:know_mean", "income_scale:know_mean"] + 
    
  # (2) * (1) * (know_mean) * cov(B_income, B_interaction)
  2 * 1 * seq(0, 1, 0.05) * vcov(m5)["income_scale", "income_scale:know_mean"]

)

Plot of conditional marginal effect

Quadratic example


Call:
lm(formula = econ_mean ~ age_scale + I(age_scale^2) + male + 
    educ_scale + income_scale, data = lucid)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.43693 -0.13101 -0.00162  0.11975  0.69088 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.322067   0.011080  29.068  < 2e-16 ***
age_scale      -0.158514   0.049534  -3.200  0.00138 ** 
I(age_scale^2)  0.119425   0.056702   2.106  0.03524 *  
male            0.033077   0.005582   5.926 3.33e-09 ***
educ_scale      0.021022   0.010864   1.935  0.05304 .  
income_scale    0.120404   0.010602  11.356  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1882 on 4684 degrees of freedom
  (58 observations deleted due to missingness)
Multiple R-squared:  0.05823,   Adjusted R-squared:  0.05723 
F-statistic: 57.93 on 5 and 4684 DF,  p-value: < 2.2e-16

Plot of conditional marginal effect

`marginaleffects` package HERE

library(marginaleffects)
options(marginaleffects_print_omit = c("s.value"))

# estimate marginal effects w/ all vars held at means
# if called w/o 'newdata' option, calculates ME for every obs in data
slopes(m5,
       variables = c("male", "educ_scale", "income_scale"),
       newdata = "mean")


         Term Contrast Estimate Std. Error     z Pr(>|z|)   2.5 % 97.5 %
 educ_scale      dY/dX   0.0258    0.01114  2.31   0.0206 0.00395 0.0476
 income_scale    dY/dX   0.1236    0.01066 11.59   <0.001 0.10266 0.1444
 male            1 - 0   0.0348    0.00564  6.18   <0.001 0.02378 0.0459

Type:  response

At range of values

# estimate marginal effect of income across values of knowledge
m_income <- slopes(m5,
                   variables = c("income_scale"),
                   newdata = datagrid(know_mean = seq(0, 1, 0.1)))
m_income


 know_mean Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
       0.0    0.156     0.0228  6.86   <0.001 0.1116  0.201
       0.1    0.151     0.0199  7.57   <0.001 0.1116  0.190
       0.2    0.145     0.0172  8.43   <0.001 0.1113  0.179
       0.3    0.139     0.0147  9.47   <0.001 0.1106  0.168
       0.4    0.134     0.0127 10.57   <0.001 0.1091  0.159
       0.5    0.128     0.0112 11.43   <0.001 0.1063  0.150
       0.6    0.123     0.0106 11.54   <0.001 0.1019  0.144
       0.7    0.117     0.0111 10.59   <0.001 0.0955  0.139
       0.8    0.112     0.0124  9.01   <0.001 0.0873  0.136
       0.9    0.106     0.0144  7.38   <0.001 0.0779  0.134
       1.0    0.100     0.0168  5.99   <0.001 0.0676  0.133

Term: income_scale
Type:  response 
Comparison: dY/dX

Plot

Average marginal effects

avg_slopes calculates marginal effect at every observation and then takes mean

avg_slopes(m5,
           variables = c("age_scale", "male", "educ_scale", "income_scale")
)


         Term Contrast Estimate Std. Error     z Pr(>|z|)    2.5 %  97.5 %
 age_scale       dY/dX  -0.0479    0.01277 -3.75   <0.001 -0.07294 -0.0229
 educ_scale      dY/dX   0.0258    0.01114  2.31   0.0207  0.00394  0.0476
 income_scale    dY/dX   0.1236    0.01066 11.59   <0.001  0.10266  0.1444
 male            1 - 0   0.0348    0.00564  6.18   <0.001  0.02378  0.0459

Type:  response

Can also do slope tests

Imagine we want to test two coefficients: null hypothesis is equality

summary(m5)


Call:
lm(formula = econ_mean ~ age_scale + I(age_scale^2) + male + 
    educ_scale + income_scale * know_mean, data = lucid)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.46091 -0.13112 -0.00257  0.11944  0.69693 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)             0.3157633  0.0135866  23.241  < 2e-16 ***
age_scale              -0.1561355  0.0495193  -3.153  0.00163 ** 
I(age_scale^2)          0.1232286  0.0566972   2.173  0.02980 *  
male                    0.0348407  0.0056419   6.175 7.16e-10 ***
educ_scale              0.0257784  0.0111371   2.315  0.02068 *  
income_scale            0.1561915  0.0227687   6.860 7.79e-12 ***
know_mean               0.0007747  0.0165371   0.047  0.96264    
income_scale:know_mean -0.0557548  0.0330965  -1.685  0.09213 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1881 on 4682 degrees of freedom
  (58 observations deleted due to missingness)
Multiple R-squared:  0.05968,   Adjusted R-squared:  0.05827 
F-statistic: 42.45 on 7 and 4682 DF,  p-value: < 2.2e-16

hypotheses(m5, hypothesis = "male - educ_scale = 0")


        Hypothesis Estimate Std. Error     z Pr(>|z|)   2.5 % 97.5 %
 male-educ_scale=0  0.00906     0.0128 0.711    0.477 -0.0159 0.0341

Simulation for inference

In this context, “simulation” describes the process of repeated random sampling from the sampling distribution

We can calculate quantities of interest for each sample realization and then summarize these
The resulting distributions give us information about uncertainty w/o needing analytic solutions for SEs
This is easily done using the arm package’s sim function, but it is easy by hand as well
marginaleffects has this functionality too (though it is currently labeled “EXPERIMENTAL”) - it also does NOT (based on help files) propogate uncertainty for \(\hat{\sigma}\) (see below)

Using `sim` from `arm`

# draw 1,000 realizations of model parameters from sampling distribution
m4_sims <- arm::sim(m4, n.sims=1000)

# the betas are stored in @coef
m4_sims@coef[1:3,]

     (Intercept)  age_scale I(age_scale^2)       male   educ_scale income_scale
[1,]   0.3201757 -0.1165536     0.06712922 0.03230549  0.023721840    0.1156481
[2,]   0.3084358 -0.1253236     0.07547290 0.03793351  0.019553911    0.1313397
[3,]   0.3538939 -0.2406292     0.17863659 0.04540851 -0.001600411    0.1216964

# the sigmas are stored in @sigma
m4_sims@sigma[1:3]

[1] 0.1872169 0.1880067 0.1916445

What does `sim` do?

For each sample, sim first draws a random value of \(\hat{\sigma}\) from its sampling distribution, and then uses this value to draw a random sample from the sampling distribution for \(\hat{\boldsymbol{\beta}}\)

sigma.hat <- summary(m4)$sigma
beta.hat <- summary(m4)$coef[, 1, drop=FALSE]
cov.beta <- summary(m4)$cov.unscaled
n <- summary(m4)$df[1] + summary(m4)$df[2]
k <- summary(m4)$df[1]
n.sims <- 1000

sigma <- array(NA, n.sims)
beta <- array(NA, c(n.sims, k))
for (i in 1:n.sims){
  sigma[i] <- sigma.hat*sqrt((n-k)/rchisq(1,n-k))
  beta[i,] <- MASS::mvrnorm (1, beta.hat, cov.beta*sigma[i]^2)
}

`sim` draws from MV normal

This is the asymptotic distribution of \(\hat{\boldsymbol{\beta}}\) under standard assumptions, and the t distribution converges pretty quickly to normal with increasing sample size

For small samples, say, less than 100, you might not want to rely on asymptotic normality
A good alternative is bootstrapping (described below), which does not make distributional assumptions

Example, quadratic

# calculate conditional marginal effect of age for each sim
age_cme_sims <- apply(m4_sims@coef, 1, function(x) x["age_scale"] + 2*seq(0,1,0.05)*x["I(age_scale^2)"])
dim(age_cme_sims)

[1]   21 1000

# summary
age_cme <- round(cbind(B = coef(m4)["age_scale"] + 2*seq(0,1,0.05)*coef(m4)["I(age_scale^2)"],
                       se = apply(age_cme_sims, 1, sd), 
                       lowCI = apply(age_cme_sims, 1, function(x) quantile(x, 0.025)),
                       highCI = apply(age_cme_sims, 1, function(x) quantile(x, 0.975))
)
, 3)
age_cme

           B    se  lowCI highCI
 [1,] -0.159 0.050 -0.259 -0.067
 [2,] -0.147 0.044 -0.236 -0.066
 [3,] -0.135 0.039 -0.213 -0.063
 [4,] -0.123 0.033 -0.191 -0.059
 [5,] -0.111 0.028 -0.168 -0.056
 [6,] -0.099 0.023 -0.146 -0.052
 [7,] -0.087 0.019 -0.124 -0.050
 [8,] -0.075 0.015 -0.105 -0.045
 [9,] -0.063 0.013 -0.088 -0.040
[10,] -0.051 0.013 -0.076 -0.028
[11,] -0.039 0.015 -0.068 -0.011
[12,] -0.027 0.019 -0.063  0.010
[13,] -0.015 0.023 -0.060  0.030
[14,] -0.003 0.028 -0.057  0.053
[15,]  0.009 0.034 -0.055  0.075
[16,]  0.021 0.039 -0.054  0.098
[17,]  0.033 0.044 -0.050  0.122
[18,]  0.045 0.050 -0.049  0.145
[19,]  0.056 0.055 -0.047  0.168
[20,]  0.068 0.061 -0.046  0.192
[21,]  0.080 0.066 -0.043  0.216

Plot

Resampling methods

Resampling methods use the actual data you have to approximate the sampling distribution of quantities of interest

They are “resampling” in the sense that they estimate the model many times, on varying subsets of observations, to produce a distribution of estimates
Jackknife
Bootstrap (many variants; large and complicated literature)

Much of my discussion here draws from Hanson (2022), Ch. 10, which has an extensive treatment of these methods

Advantages and disadvantages

Advantages

Does not require ability to calculate variance of estimator of quantity of interest
Often performs better when standard assumptions don’t hold (e.g., normality of errors, “large enough” sample)

Disadvantages

Computationally more expensive (though varies a lot across methods)
The statistical theory is rather difficult (for me at least!)
There is a large literature with many variants and approaches

All that said, if you are looking for a “go-to” approach when finite-sample properties cannot be assumed, bootstrapping is a reasonable bet

Jackknife

A leave-one-out estimator computes estimates using all observations except one (there are thus \(N\) possibilities)

The jackknife variance estimator calculates the variance of a parameter of interest by calculating a function of the variance of the set of leave-one-out estimators

\[ \text{Var} (\hat{\theta})_{JK} = \frac{N - 1}{N} \sum_1^N (\hat{\theta}_{(-i)} - \bar{\theta})^2 \]

Square root is jackknife standard error

Jackknife

Very flexible method, overlap with other approaches

Requires no distributional assumptions
Asymptotically approximates the Delta-method estimator (see below)
Approximates the HC3 heteroskedasticity-robust estimator (see week on heteroskedasticity), but without assumptions regarding asymptotic covariance matrix
But does require \(N\) separate estimations (not a big deal for the models we are dealing with)

Jackknife example

# sample size and num parms for model
N <- nrow(m5$model)
K <- length(coef(m5))

# init array to store betas
betas <- array(NA, c(K, N))

# leave one out
for (i in 1:nrow(m5$model)){
  
  betas[, i] <- coef(update(m5, . ~ ., data = m5$model[-i, ]))
  
}

# calculate SE estimates
cbind(JK = round( apply(betas, 1, 
                        function(x) sqrt( ( (N-1)/N ) * sum( (x - mean(x))^2 ) ) ),
                  3 ), 
      
      OLS = round(sqrt(diag(vcov(m5))), 3),
      
      HC3 = round(sqrt(diag(sandwich::vcovHC(m5, type = "HC3"))), 3)
)

                          JK   OLS   HC3
(Intercept)            0.013 0.014 0.013
age_scale              0.048 0.050 0.048
I(age_scale^2)         0.056 0.057 0.056
male                   0.006 0.006 0.006
educ_scale             0.011 0.011 0.011
income_scale           0.021 0.023 0.021
know_mean              0.017 0.017 0.017
income_scale:know_mean 0.033 0.033 0.033

Jackknife variance matrix

\[ \text{Var} (\hat{\theta})_{JK} = \frac{N - 1}{N} \sum_{i=1}^N (\hat{\boldsymbol{\beta}}_{(-i)} - \bar{\boldsymbol{\beta}}) (\hat{\boldsymbol{\beta}}_{(-i)} - \bar{\boldsymbol{\beta}})' \]

# sample size and num parms for model
N <- nrow(m5$model)
K <- length(coef(m5))

# init array to store betas
betas <- array(NA, c(K, N))

# leave one out
for (i in 1:nrow(m5$model)){
  
  betas[, i] <- coef(update(m5, . ~ ., data = m5$model[-i, ]))
  
}

# jackknife variance matrix
betas_mean <- rowMeans(betas)
sum_mats <- Reduce("+",  lapply(1:N, function(i) 
  (betas[, i] - betas_mean) %*% t(betas[, i] - betas_mean)))
V_jack <- (N - 1)/N * sum_mats

# calculate SE estimates
cbind(JK = round(sqrt(diag(V_jack)), 3),
      OLS = round(sqrt(diag(vcov(m5))), 3),
      HC3 = round(sqrt(diag(sandwich::vcovHC(m5, type = "HC3"))), 3)
)

                          JK   OLS   HC3
(Intercept)            0.013 0.014 0.013
age_scale              0.048 0.050 0.048
I(age_scale^2)         0.056 0.057 0.056
male                   0.006 0.006 0.006
educ_scale             0.011 0.011 0.011
income_scale           0.021 0.023 0.021
know_mean              0.017 0.017 0.017
income_scale:know_mean 0.033 0.033 0.033

Bootstrapping

Bootstrapping repeatedly samples (w/ replacement, each of size \(N\)) from the data and runs the model for each

The distribution of estimated parameters (1 for each resample), is the estimated sampling distribution
Bootstrapping makes no assumptions about the form of the sampling distribution
More computationally demanding (some approaches more demanding than others)

population <- 1:10

boot_1 <- sample(population, size = 10, replace = TRUE)
boot_2 <- sample(population, size = 10, replace = TRUE)
boot_3 <- sample(population, size = 10, replace = TRUE)

boot_1

 [1] 1 5 8 2 4 6 6 3 3 9

boot_2

 [1]  5 10  3  8 10  1  7  7 10  2

boot_3

 [1]  2  1  2  3 10 10  3  4  7  6

Example with loop

Draw boots random samples from analytical sample, each of size \(N\), run model, store coefs

# number of bootstrap samples
boots <- 1000

# initialize array to store coefs
boots_beta <- array(NA, c(boots, # num of bootstrap samples
                          summary(m4)$df[1]) # number of regression coefficients (K+1)
)

# iterate over samples
for (i in 1:boots){
  
  new <- sample(1:nrow(m4$model), nrow(m4$model), replace = T) # sample rows of data w/ replacement
  
  boots_beta[i, ] <- coef(lm(formula(m4), data = m4$model[new, ])) # run model with bootstrap sample
 
  # repeat boots number of times 
}

Standard errors

The distribution of bootstrap estimates is an estimate of the sampling distribution of the parameter, which we can then use for inference

The bootstrap standard error is simply the square root of the sample variance of bootstrap estimates
It is consistent for simple linear models

There are cases where it is inconsistent, especially w/ non-linear functions of estimated coefficients

An alternative, “trimmed” estimator is preferable in these cases
Cut the most extreme p% of the bootstrap estimates (e.g., 1%)
Consistent under more general conditions

Confidence intervals

There is a large literature on methods for calculating bootstrap confidence intervals, and various different options

normal approximation
percentile intervals
bias-corrected
percentile-t

Normal approximation

Use bootstrap standard error and plug into formula for confidence intervals using normal distribution

Often the default
Not the best option: worse performance than others

Percentile intervals

Choose confidence level (e.g., \(\alpha = 5%\)), select quantiles of bootstrap estimates associated with \(\alpha/2\) and \((100 - \alpha/2)\) (e.g., 2.5% and 97.5%)

Easy and popular; does not require calculation of standard errors
Consistent under weaker assumptions than normal approximation and bootstrap standard errors, and does not require trimming

Requires that the asymptotic distribution is symmetric about 0

Finite-sample performance may be unsatisfactory if estimator is asymmetric for your sample size
This includes cases where estimators are biased (even if consistent), which is often true with non-linear models

Bias-corrected (BC) intervals

Adjusts percentile intervals for finite-sample bias using degree of asymmetry in empirical distribution of bootstrap estimates around parameter estimate

Provides “exact” coverage under assumption that bias depends only on a parameter drawn from a symmetric, invertible distribution
Is also asymptotically valid for the interval of interest

An extension, the accelerated bias-corrected estimator (BC\(_{\alpha}\)) also estimates skewness in sampling distribution (in addition to bias) and adjusts for it

Will typically have better performance than BC
Faster rate of asymptotic convergence than percentile and BC methods
Can take a long time to estimate in R

Percentile-t intervals

Calculate a t-statistic for every bootstrap sample: \(T^* = \frac{\hat{\theta}^* - \hat{\theta}}{se(\hat{\theta}^*)}\)

\(\hat{\theta}^*\) is the bootstrap parameter estimate for a particular bootstrap sample
\(se(\hat{\theta}^*)\) is the estimated standard error for a particular bootstrap sample
\(\hat{\theta}\) is the estimate from your standard procedure (e.g., OLS coefficient)

Percentile-t intervals

Calculate the percentile-t interval from the quantiles of \(T^*\), \(q^*\):

\[ \left( \hat{\theta} - se(\hat{\theta}) q_{(1 - \alpha/2)}^*, \space \hat{\theta} - se(\hat{\theta}) q_{(\alpha/2)}^* \right) \]

Uses standard error, so need a method for calculating, and accuracy depends on the reliability of the estimate
- Can use bootstrap to estimate SE on each bootstrap (i.e., nested bootstrap)
When SE estimate reliable, may be better than other methods
Faster rate of asymptotic convergence than percentile and BC methods

Using `boot` package via `marginaleffects`

# estimate slopes
m5_me <- slopes(m5,
                variables = c("male", "educ_scale", "income_scale"),
                newdata = "mean")

# use inferences() to get bootstrap intervals
inferences(m5_me, method = "boot", R = 1000, conf_type = "perc")
inferences(m5_me, method = "boot", R = 1000, conf_type = "bca")

`boot` package

library(boot)

# boot call
m5_boot <- boot(m5$model, 
                function(data, indices) {
                  d <- data[indices, ]
                  coef(update(m5, . ~ ., data = d))[5]
                },
                R = 10) # DON'T USE 10!!! THIS IS NOT A RECOMMENDATION

# use boot.ci to get CIs
boot.ci(m5_boot)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10 bootstrap replicates

CALL : 
boot.ci(boot.out = m5_boot)

Intervals : 
Level      Normal              Basic         
95%   ( 0.0129,  0.0399 )   ( 0.0141,  0.0366 )  

Level     Percentile            BCa          
95%   ( 0.0149,  0.0375 )   ( 0.0149,  0.0375 )  
Calculations and Intervals on Original Scale
Warning : Basic Intervals used Extreme Quantiles
Some basic intervals may be unstable
Warning : Percentile Intervals used Extreme Quantiles
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable

`boot` package to get percentile-t

# boot call
m5_boot <- boot(m5$model, 
                function(data, indices) {
                  d <- data[indices, ]
                  fit <- update(m5, . ~ ., data = d)
                  B <- coef(fit)[5]
                  se <- sqrt(diag(vcov(fit)))[5]
                  return(c(B, se))
                },
                R = 1000)

# use boot.ci to get CIs
boot.ci(m5_boot, type = "stud")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = m5_boot, type = "stud")

Intervals : 
Level    Studentized     
95%   ( 0.0033,  0.0467 )  
Calculations and Intervals on Original Scale

Bootstrap hypothesis testing

We can use the bootstrapped t-statistics to calculate p-values - recall:

\[T^* = \frac{\hat{\theta}^* - \hat{\theta}}{se(\hat{\theta}^*)}\]

The (two-tailed) p-value is then the percent of \(|T^*| > |T|\), where \(T\) is the t-statistic from your standard procedure (e.g., OLS output for a coefficient)

Wild bootstrap

Our bootstrap methods to this point all sample both \(X\) and \(Y\), but our standard linear regression model treats \(X\) as fixed and imposes \(\text{E}(u | \mathbf{X}) = 0\)

Integrating these assumptions into the bootstrap will increase precision (and also take into account heteroskedasticity)

The wild bootstrap fixes \(X\) and takes draws of the error to generate bootstrap observations:

\[ y_i^* = \boldsymbol{x}_i'\hat{\boldsymbol{\beta}} + \hat{u}_i\xi_i^*, \\ \quad \Pr(\xi_i^* = 1) = \Pr(\xi_i^* = -1) = 0.50 \]

For hypothesis testing, it is better to generate bootstrap resamples using the null model (set coefficients to null hypothesis values to generate residuals and then generate \(y^*\))

Structured data (see Cameron et al. 2008)

If you have data with “structure” that is relevant to the analysis, you need to reproduce that structure in your resampling

With “clustered” data, we want to reproduce the clustering so we don’t overstate precision
- E.g., if you have students nested within schools, resample schools
- With wild bootstrap, each cluster gets a single draw of \(\xi_i^*\)

With clustered data, you are effectively treating the sample size as the number of clusters

Small number of clusters may imply inaccurate approximations

How many?

For simulation-based and bootstrapped calculations, how many should you use?

As always with these things, no firm answer beyond “as many as possible given constraints”

A reasonable approach when large number is very time-consuming:

Use small number (e.g., 10-100) when checking to make sure your code works
Use modest number for working “drafts” of your analyses (e.g., 500-1,000)
When you are ready to generate final “archive-ready” results, use large number(10,000) and let it run while you sleep

Set seed!

Remember to set a seed value (especially for your final runs that will be archived) (set.seed())

Delta method

A differentiable function of an asymptotically normal estimator is also asymptotically normal

\[\sqrt{N} \left( f(\hat{\boldsymbol{\beta}}) - f(\boldsymbol{\beta}) \right) \underset{d}{\rightarrow} \ Normal \left( 0, \left( f'(\boldsymbol{\beta}) \right)^T \text{Var}(\hat{\boldsymbol{\beta}}) (f'(\boldsymbol{\beta})) \right)\]

\(f(\hat{\boldsymbol{\beta}})\): some function of our estimator
\(f(\boldsymbol{\beta})\): the same function evaluated at the true population value
\(f'(\boldsymbol{\beta})\): the gradient (vector of 1st partial derivatives) of the function evaluated at the true value

Very flexible, but relies on asymptotic normality of estimator (error of approximation converges to 0 as \(N \rightarrow \infty\))

Delta method

Approximate a continuous function by its 1st derivatives evaluated at a particular point:

\[ f(\hat{\boldsymbol{\beta}}) \approx f(\boldsymbol{\beta}) + f'(\boldsymbol{\beta})^T(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) \]

\[ \left(f(\hat{\boldsymbol{\beta}}) - f(\boldsymbol{\beta}) \right)^2 \approx \left (f'(\boldsymbol{\beta})^T(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) \right)^2 \]

\[ \left(f(\hat{\boldsymbol{\beta}}) - f(\boldsymbol{\beta}) \right)^2 \approx f'(\boldsymbol{\beta})^T(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) (\hat{\boldsymbol{\beta}} - \boldsymbol{\beta})^T f'(\boldsymbol{\beta}) \]

\[ \text{E} \left((f(\hat{\boldsymbol{\beta}}) - f(\boldsymbol{\beta}))^2 \right) \approx \left(f'(\boldsymbol{\beta}) \right)^T \text{E} \left((\hat{\boldsymbol{\beta}} - \boldsymbol{\beta})^2 \right) f'(\boldsymbol{\beta}) \]

\[ \text{Var} \left(f(\hat{\boldsymbol{\beta}}) \right) \approx \left(f'(\boldsymbol{\beta}) \right)^T \text{Var}(\hat{\boldsymbol{\beta}}) (f'(\boldsymbol{\beta})) \]

Substitute estimates: \(\hat{\text{Var}}(\hat{\boldsymbol{\beta}})\), and \(\hat{\boldsymbol{\beta}}\)

Example

SE for conditional ME of age: \(f(\hat{\boldsymbol{\beta}}) = \hat{\beta}_1 + 2\hat{\beta}_2age_i\)

\[ (f'(\boldsymbol{\beta}))^T = \left( \frac{\partial f}{\partial \boldsymbol{\beta}}\Bigr|_{\boldsymbol{\beta} = \hat{\boldsymbol{\beta}}} \right)^T = \begin{bmatrix} 1 & 2age_i \end{bmatrix} \\ \]

\[ (f'(\boldsymbol{\beta}))^T \hat{\text{Var}}(\hat{\boldsymbol{\beta}}) (f'(\boldsymbol{\beta})) = \begin{bmatrix} 1 & 2age_i \end{bmatrix} \hat{\text{Var}}(\hat{\boldsymbol{\beta}}_{2:3}) \begin{bmatrix} 1 \\ 2age_i \end{bmatrix} \\ \]

\[ = \hat{\text{Var}}(\hat{\beta}_1) + (2age_i)^2\hat{\text{Var}}(\hat{\beta}_2) \space + \space 4age_i\hat{\text{Cov}}(\hat{\beta}_1, \hat{\beta}_2) \]

Non-linear example

m1 <- lm(log(salary) ~ yrs.since.phd + sex + discipline, 
         data = carData::Salaries)
summary(m1)


Call:
lm(formula = log(salary) ~ yrs.since.phd + sex + discipline, 
    data = carData::Salaries)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.84527 -0.15697 -0.00855  0.15550  0.62670 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.125e+01  4.189e-02 268.671  < 2e-16 ***
yrs.since.phd 9.584e-03  9.075e-04  10.560  < 2e-16 ***
sexMale       6.639e-02  3.830e-02   1.734   0.0838 .  
disciplineB   1.447e-01  2.319e-02   6.239 1.14e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2244 on 393 degrees of freedom
Multiple R-squared:  0.2615,    Adjusted R-squared:  0.2559 
F-statistic: 46.39 on 3 and 393 DF,  p-value: < 2.2e-16

Non-linear example

Avg effect of change from 10 to 20 years since PhD on unlogged salaries: \(f = \frac{1}{N} \sum_N e^{\boldsymbol{x}_{i,yrs=20}\hat{\boldsymbol{\beta}}}E[u_i] - e^{\boldsymbol{x}_{i,yrs=10}\hat{\boldsymbol{\beta}}}E[u_i]\)

\(\frac{\partial f}{\partial \boldsymbol{\beta}} =\)

\[ \begin{bmatrix} \frac{1}{N} \sum_N e^{\boldsymbol{x}_{i,yrs=20}\hat{\boldsymbol{\beta}}}E[u_i] - e^{\boldsymbol{x}_{i,yrs=10}\hat{\boldsymbol{\beta}}}E[u_i] \\ \frac{1}{N} \sum_N 20e^{\boldsymbol{x}_{i,yrs=20}\hat{\boldsymbol{\beta}}}E[u_i] - 10e^{\boldsymbol{x}_{i,yrs=10}\hat{\boldsymbol{\beta}}}E[u_i] \\ \frac{1}{N} \sum_N male_ie^{\boldsymbol{x}_{i,yrs=20}\hat{\boldsymbol{\beta}}}E[u_i] - male_ie^{\boldsymbol{x}_{i,yrs=10}\hat{\boldsymbol{\beta}}}E[u_i] \\ \frac{1}{N} \sum_N discipline_ie^{\boldsymbol{x}_{i,yrs=20}\hat{\boldsymbol{\beta}}}E[u_i] - discipline_ie^{\boldsymbol{x}_{i,yrs=10}\hat{\boldsymbol{\beta}}}E[u_i] \end{bmatrix} \]

Calculation

\[f = e^{\bf{X_{20}}\hat{\boldsymbol{\beta}}}E[u_i] - e^{\bf{X_{10}}\hat{\boldsymbol{\beta}}}E[u_i]\]

# new data

X <- cbind(1, m1$model[,2:4])

X$sex <- ifelse(X$sex == "Male", 1, 0)

X$discipline <- ifelse(X$discipline == "A", 1, 0)

nd_1 <- nd_2 <- X

nd_1$yrs.since.phd <- 10

nd_2$yrs.since.phd <- 20

nd_1 <- as.matrix(nd_1)

nd_2 <- as.matrix(nd_2)

# define Jacobian

J <- c()

J[1] <- mean( exp(nd_2 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) -  exp(nd_1 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) )

J[2] <- mean( 20*exp(nd_2 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) -  10*exp(nd_1 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) )

J[3] <- mean( nd_2[,3] * exp(nd_2 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) -  nd_1[,3]*exp(nd_1 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) )

J[4] <- mean( nd_2[,4] * exp(nd_2 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) -  nd_1[,4] * exp(nd_1 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) )

# calculate first difference

fd <- mean( exp(nd_2 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) -  exp(nd_1 %*% coef(m1)) * exp(summary(m1)$sigma^2 / 2) )

# calculate standard error of first difference

se <- sqrt(t(J) %*% vcov(m1) %*% J)

Calculation

# print

cbind(fd, se)

           fd         
[1,] 9973.553 883.2016

Using `car` package

Holding all other variables at central tendencies:

library(car)

Eu_1 <- sum(exp(m1$residuals)) / nrow(m1$model)

car <- deltaMethod(m1,

                   "exp(b0 + b1*20 + b2*0 + b3*0)*Eu_1 - exp(b0 + b1*10 + b2*0 + b3*0)*Eu_1",

                  parameterNames = c("b0","b1","b2","b3"))

cbind(car$Estimate, car$SE)

         [,1]     [,2]
[1,] 8765.543 861.2572

`marginaleffects` package

The delta method is the default approach to calculating standard errors in marginaleffects

I do not see a simple approach for logged DVs, but I think you can write your own functions (or use car)
But for most cases, it will work well
The exception is when you have exact standard errors (assuming normal regression errors), in which case, you would use t for tests and CIs, but delta method uses normal

Advanced Strategies for Inference

Inference w/ non-linearity

Analytic: variance of linear combination

Standard errors for interactions (see also here)

Standard errors for quadratic models

Example

Interaction example

Interaction example

Plot of conditional marginal effect

Quadratic example

Plot of conditional marginal effect

marginaleffects package HERE

At range of values

Plot

Average marginal effects

Can also do slope tests

Simulation for inference

Using sim from arm

What does sim do?

sim draws from MV normal

Example, quadratic

Plot

Resampling methods

Advantages and disadvantages

Jackknife

Jackknife

Jackknife example

Jackknife variance matrix

Bootstrapping

Example with loop

Standard errors

Confidence intervals

Normal approximation

Percentile intervals

Bias-corrected (BC) intervals

Percentile-t intervals

Percentile-t intervals

Using boot package via marginaleffects

boot package

boot package to get percentile-t

Bootstrap hypothesis testing

Wild bootstrap

Structured data (see Cameron et al. 2008)

How many?

Set seed!

Delta method

Delta method

Example

Non-linear example

Non-linear example

Calculation

Calculation

Using car package

marginaleffects package

`marginaleffects` package HERE

Using `sim` from `arm`

What does `sim` do?

`sim` draws from MV normal

Using `boot` package via `marginaleffects`

`boot` package

`boot` package to get percentile-t

Using `car` package

`marginaleffects` package