POLSCI 630: Probability and Basic Regression
February 11, 2025
Asymptotic (or large sample) properties are those that obtain as \(N \rightarrow \infty\)
Consistency: \(\text{lim}_{N\rightarrow\infty} \space \text{Pr}(|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}|>\epsilon)=0, \space \forall \space +\epsilon\)
Asymptotic distribution
\[ \hat{X} \underset{d}{\rightarrow} X, \space \text{if } F_{\hat{X}}(u) \rightarrow F_{X}(u) \text{, as } N \rightarrow \infty, \space \forall \space u \]
\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\boldsymbol{u}\)
\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = \left( \frac{N}{N}\mathbf{X}'\mathbf{X} \right)^{-1} \left( \frac{N}{N}\mathbf{X}'\boldsymbol{u} \right)\)
\(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta} = \frac{1}{N}\left( \frac{1}{N}\mathbf{X}'\mathbf{X} \right)^{-1} N\left( \frac{1}{N}\mathbf{X}'\boldsymbol{u} \right)\)
\(\text{plim}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) = \text{plim} \left[ \left( \frac{1}{N}\mathbf{X}'\mathbf{X} \right)^{-1} \left( \frac{1}{N}\mathbf{X}'\boldsymbol{u} \right) \right]\)
Weak law of large numbers says that the sample mean of any transformation of a random vector with finite mean converges in probability to the population expected value of that transformation (Hansen 2022), so assuming \(\text{E}(\mathbf{X}'\boldsymbol{u}) = 0\):
\(\text{plim}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) = \text{E} \left[ \left( \mathbf{X}'\mathbf{X} \right)^{-1} \right] \text{E} \left( \mathbf{X}'\boldsymbol{u} \right) = \text{E} \left[ \left( \mathbf{X}'\mathbf{X} \right)^{-1} \right] 0 = 0\)
OLS is consistent under weaker assumptions about the error term (\(\text{E}(\mathbf{X}'\boldsymbol{u}) = 0\)) than those required for unbiasedness (\(\text{E}(\boldsymbol{u} | \mathbf{X}) = 0\))
This means that each predictor must be uncorrelated with the error term
But there can be functions of the \(\boldsymbol{x}_i\) correlated with \(u_i\)
set.seed(1234)
# draw x
x <- rnorm(10)
# X matrix
X <- cbind(rep(1,10), x, x^2)
# "biased" X matrix
X_b <- cbind(rep(1,10), x)
# betas
B <- c(1,1,1)
# XB
yhat <- X %*% B
# unbiased beta hat
Bhat <- sapply(1:100000,
function(x) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(10, 0, 5)) )[2]
)
# biased beta hat
Bhat_b <- sapply(1:100000,
function(x) ( solve(t(X_b)%*%X_b) %*% t(X_b)%*%(yhat + rnorm(10, 0, 5)) )[2]
)
# summaries
cbind(unbiased=mean(Bhat), biased=mean(Bhat_b))
unbiased biased
[1,] 1.004281 -0.1635481
set.seed(1234)
# draw x
x <- rnorm(1000)
# X matrix
X <- cbind(rep(1,1000), x, x^2)
# "biased" X matrix
X_b <- cbind(rep(1,1000), x)
# betas
B <- c(1,1,1)
# XB
yhat <- X %*% B
# unbiased beta hat
Bhat <- sapply(1:10000,
function(x) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)
# biased beta hat
Bhat_b <- sapply(1:10000,
function(x) ( solve(t(X_b)%*%X_b) %*% t(X_b)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)
# summaries
cbind(unbiased=mean(Bhat), biased=mean(Bhat_b))
unbiased biased
[1,] 1.001011 0.9443231
The “bias”, in these cases, is a result of treating \(\mathbf{X}\) as fixed over repeated sampling
But allowing random \(\mathbf{X}\) is no free lunch I don’t think
Don’t worry much about this subtle point
If we drop assumption 6 (\(u_i\) are drawn from normal distribution), we can still say that \(\boldsymbol{\hat{\beta}}\) is asymptotically normally distributed
\[\boldsymbol{\hat{\beta}} \approx MVNormal(\boldsymbol{\beta}, \space \hat{\sigma}^2(\textbf{X}'\textbf{X})^{-1})\]
set.seed(4321)
sims <- 100000
# assumed X
X <- cbind(rep(1, 10),
MASS::mvrnorm(10, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2))
)
# Beta
B <- c(1,1,1)
# yhat
yhat <- X %*% B
## N=10
# normal errors
norme <- sapply(1:sims,
function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(10, 0, 5)) )[2]
)
# beta errors
betae <- sapply(1:sims,
function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + 50*rbeta(10, 1, 5)) )[2]
)
set.seed(4321)
sims <- 100000
# assumed X
X <- cbind(rep(1, 1000),
MASS::mvrnorm(1000, c(0,0), matrix(c(1,0.5,0.5,1), 2, 2))
)
# Beta
B <- c(1,1,1)
# yhat
yhat <- X %*% B
## N=10
# normal errors
norme <- sapply(1:sims,
function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + rnorm(1000, 0, 5)) )[2]
)
# beta errors
betae <- sapply(1:sims,
function(y) ( solve(t(X)%*%X) %*% t(X)%*%(yhat + 50*rbeta(1000, 1, 5)) )[2]
)
The use of t as an exact sampling distribution, given \(\hat{\sigma}^2\), is not justified when \(u_i\) is not normal
But since both t and the sampling distribution for \(\hat{\beta}_k\) converge to Normal as \(N \rightarrow \infty\), we might as well just use t in large samples
It’s better under conditions of normal errors, and no worse in the more general case