Asymptotics

POLSCI 630: Probability and Basic Regression

February 11, 2025

Asymptotic properties

Asymptotic (or large sample) properties are those that obtain as \(N \rightarrow \infty\)

Finite sample properties apply regardless of \(N\)
Asymptotic properties only hold in the limit - we approximate an (unknown) sampling distribution with the limiting distribution
By their nature, we cannot give general “good enough” answers about asymptotic properties
We may be able to say something about performance in particular cases through simulation

Convergence in probability and consistency

Consistency: \(\text{lim}_{N\rightarrow\infty} \space \text{Pr}(|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}|>\epsilon)=0, \space \forall \space +\epsilon\)

If true, \(\boldsymbol{\beta}\) is the probability limit (\(\text{plim}\)) of \(\hat{\boldsymbol{\beta}}\)
\(\hat{\boldsymbol{\beta}} \underset{p}\rightarrow \boldsymbol{\beta}\): \(\hat{\boldsymbol{\beta}}\) “converges in probability” to \(\boldsymbol{\beta}\)

Asymptotic distribution

\[ \hat{X} \underset{d}{\rightarrow} X, \space \text{if } F_{\hat{X}}(u) \rightarrow F_{X}(u) \text{, as } N \rightarrow \infty, \space \forall \space u \]

If true, \(\hat{X}\) “converges in distribution” to \(X\)

Example of consistency

sizes <- seq(2,1000,2)
betas <- sapply(sizes, function(x) mean(rnorm(x)))
plot(sizes, betas, type="l", ylim=c(-0.5,0.5))
abline(h=0, lty=3)

Consistency proof

\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\boldsymbol{u}\)

\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = \left( \frac{N}{N}\mathbf{X}'\mathbf{X} \right)^{-1} \left( \frac{N}{N}\mathbf{X}'\boldsymbol{u} \right)\)

\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = \frac{1}{N}\left( \frac{1}{N}\mathbf{X}'\mathbf{X} \right)^{-1} N\left( \frac{1}{N}\mathbf{X}'\boldsymbol{u} \right)\)

\(\text{plim}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) = \text{plim} \left[ \left( \frac{1}{N}\mathbf{X}'\mathbf{X} \right)^{-1} \left( \frac{1}{N}\mathbf{X}'\boldsymbol{u} \right) \right]\)

Weak law of large numbers says that the sample mean of any transformation of a random vector with finite mean converges in probability to the population expected value of that transformation (Hansen 2022), so assuming \(\text{E}(\mathbf{X}'\boldsymbol{u}) = 0\):

\(\text{plim}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) = \text{E} \left[ \left( \mathbf{X}'\mathbf{X} \right)^{-1} \right] \text{E} \left( \mathbf{X}'\boldsymbol{u} \right) = \text{E} \left[ \left( \mathbf{X}'\mathbf{X} \right)^{-1} \right] 0 = 0\)

Consistency of OLS

OLS is consistent under weaker assumptions about the error term (\(\text{E}(\mathbf{X}'\boldsymbol{u}) = 0\)) than those required for unbiasedness (\(\text{E}(\boldsymbol{u} | \mathbf{X}) = 0\))

\(\text{E}(u_i)=0\)
\(\text{Cov}(x_{ki},u_i)=0, \space \forall \space k\)

This means that each predictor must be uncorrelated with the error term

But there can be functions of the \(\boldsymbol{x}_i\) correlated with \(u_i\)
- OLS estimates are biased in such cases, but still consistent

Example w/ small sample

set.seed(1234)

# draw x
x <- rnorm(10)

# X matrix
X <- cbind(rep(1,10), x, x^2)

# "biased" X matrix
X_b <- cbind(rep(1,10), x)

# betas
B <- c(1,1,1)

# XB
yhat <- X %*% B

# unbiased beta hat
Bhat <- sapply(1:100000,
               function(x) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(10, 0, 5)) )[2]
)

# biased beta hat
Bhat_b <- sapply(1:100000,
               function(x) ( solve(t(X_b)%*%X_b) %*% t(X_b)%*%(yhat + rnorm(10, 0, 5)) )[2]
)

# summaries
cbind(unbiased=mean(Bhat), biased=mean(Bhat_b))

     unbiased     biased
[1,] 1.004281 -0.1635481

Example w/ big sample

set.seed(1234)

# draw x
x <- rnorm(1000)

# X matrix
X <- cbind(rep(1,1000), x, x^2)

# "biased" X matrix
X_b <- cbind(rep(1,1000), x)

# betas
B <- c(1,1,1)

# XB
yhat <- X %*% B

# unbiased beta hat
Bhat <- sapply(1:10000,
               function(x) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)

# biased beta hat
Bhat_b <- sapply(1:10000,
               function(x) ( solve(t(X_b)%*%X_b) %*% t(X_b)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)

# summaries
cbind(unbiased=mean(Bhat), biased=mean(Bhat_b))

     unbiased    biased
[1,] 1.001011 0.9443231

A subtle point

The “bias”, in these cases, is a result of treating \(\mathbf{X}\) as fixed over repeated sampling

In any finite sample, there will be a correlation between a predictor and the error term
But if we treat \(\mathbf{X}\) as random, and take expectations over repeated samples, OLS remains unbiased

But allowing random \(\mathbf{X}\) is no free lunch I don’t think

I think we pay for it in increased variance of the OLS estimator
When we calculate the variance, we now have randomness in both \(\boldsymbol{u}\) and \(\mathbf{X}\), and larger SEs

Practical implications

The number of situations where OLS is biased but consistent is likely small, for practical purposes
Generally speaking, if you knew you were in one of these situations, you would probably want to model the true population regression function anyways!
And if there is any correlation between a predictor and the error term, OLS is not even consistent anymore

Don’t worry much about this subtle point

Asymptotic normality of \(\boldsymbol{\hat{\beta}}\)

If we drop assumption 6 (\(u_i\) are drawn from normal distribution), we can still say that \(\boldsymbol{\hat{\beta}}\) is asymptotically normally distributed

We can’t claim the exact distribution for \(\boldsymbol{\hat{\beta}}\)
But with “large” sample sizes we can invoke the central limit theorem to say:

\[\boldsymbol{\hat{\beta}} \approx MVNormal(\boldsymbol{\beta}, \space \hat{\sigma}^2(\textbf{X}'\textbf{X})^{-1})\]

Example, small sample

set.seed(4321)

sims <- 100000

# assumed X
X <- cbind(rep(1, 10), 
           MASS::mvrnorm(10, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2))
)

# Beta
B <- c(1,1,1)

# yhat
yhat <- X %*% B

## N=10

# normal errors
norme <- sapply(1:sims,
                 function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(10, 0, 5)) )[2]
)

# beta errors
betae <- sapply(1:sims,
                 function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + 50*rbeta(10, 1, 5)) )[2]
)

Example, small sample

Example, big sample

set.seed(4321)

sims <- 100000

# assumed X
X <- cbind(rep(1, 1000), 
           MASS::mvrnorm(1000, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2))
)

# Beta
B <- c(1,1,1)

# yhat
yhat <- X %*% B

## N=10

# normal errors
norme <- sapply(1:sims,
                 function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)

# beta errors
betae <- sapply(1:sims,
                 function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + 50*rbeta(1000, 1, 5)) )[2]
)

Example, big sample

Practical implications

The use of t as an exact sampling distribution, given \(\hat{\sigma}^2\), is not justified when \(u_i\) is not normal
But since both t and the sampling distribution for \(\hat{\beta}_k\) converge to Normal as \(N \rightarrow \infty\), we might as well just use t in large samples
It’s better under conditions of normal errors, and no worse in the more general case

Asymptotics

Asymptotic properties

Convergence in probability and consistency

Example of consistency

Consistency proof

Consistency of OLS

Example w/ small sample

Example w/ big sample

A subtle point

Practical implications

Asymptotic normality of \(\boldsymbol{\hat{\beta}}\)

Example, small sample

Example, small sample

Example, big sample

Example, big sample

Practical implications

Wooldridge, p. 175