\[ln(e^x) = x\] \[e^{ln(x)} = x\] Note: the
log()
function computes natural logs by default. The
exp()
function computes natural exponential functions.
x <- 10
log(exp(x))
## [1] 10
exp(log(x))
## [1] 10
\[ exp(a + b) = e^{a + b} = e^ae^b \] \[ exp(a - b) = e^{a - b} = \frac{e^a}{e^b} \] \[ ln(a) + ln(b) = ln(ab)\] \[ ln(a) - ln(b) = ln(\frac{a}{b})\] \[\frac{d}{dx} ln(x) = \frac{1}{x}\]
a <- 10; b <- 7
exp(a + b) == exp(a) * exp(b)
## [1] TRUE
round(exp(a - b)) == round(exp(a) / exp(b))
## [1] TRUE
log(a) + log(b) == log(a*b)
## [1] TRUE
round(log(a) - log(b)) == round(log(a/b))
## [1] TRUE
D(expression(log(x)), "x")
## 1/x
A 1-unit increase in ln(x) implies a non-constant effect in x. Constant increases in ln(x) are associated with order of magnitude increases in x.
log(10*x) - log(x)
## [1] 2.302585
log(100*x) - log(10*x)
## [1] 2.302585
log(1000*x) - log(100*x)
## [1] 2.302585
# log(a) - log(b) == log(a/b)
log((10*x)/x)
## [1] 2.302585
log(10)
## [1] 2.302585
In this lab, we will return to the CEO salaries data set.
salary
: 1990 compensation ($1000s)
profits
: 1990 profits ($ millions)
ceosalary <- read.csv("data/ceosalary.csv")
Create a variable lsalary
that takes the log of salary
in dollars. Create a variable lprofits
that takes the log
of profits in dollars.
What type of data will be lost in this transformation?
ceosalary$lsalary <- log(ceosalary$salary * 1000)
ceosalary$lprofits <- log(ceosalary$profits * 1000000)
## Warning in log(ceosalary$profits * 1e+06): NaNs produced
1 unit change in \(x_1\) is associated with a \(\beta_1\) unit change in y. Remember: units of x (millions) and units of y (thousands)!
m1 <- lm(salary ~ profits, ceosalary)
summary(m1)
##
## Call:
## lm(formula = salary ~ profits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -872.4 -319.7 -119.8 242.0 4484.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 746.9238 45.7979 16.31 < 2e-16 ***
## profits 0.5723 0.1009 5.67 5.81e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 541.6 on 175 degrees of freedom
## Multiple R-squared: 0.1552, Adjusted R-squared: 0.1504
## F-statistic: 32.14 on 1 and 175 DF, p-value: 5.805e-08
\[ ln(y_i) = \beta_0 + \beta_1 x_{1i} + u_i\]
Factor Change: 1 unit change in \(x_1\) multiplies \(E(y)\) by \(e^{\beta_1}\). Percent Change: 1 unit change in \(x_1\) is associated with a \(100(e^{\beta_1(\Delta x_1)} - 1)\) percent change in \(E(y)\).
m2 <- lm(lsalary ~ profits, ceosalary)
summary(m2)
##
## Call:
## lm(formula = lsalary ~ profits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.08834 -0.36626 0.02351 0.39714 2.04523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.337e+01 4.718e-02 283.343 < 2e-16 ***
## profits 5.944e-04 1.040e-04 5.717 4.6e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5579 on 175 degrees of freedom
## Multiple R-squared: 0.1574, Adjusted R-squared: 0.1526
## F-statistic: 32.68 on 1 and 175 DF, p-value: 4.597e-08
# factor change
exp(coef(m2)[2])
## profits
## 1.000595
# percent change
100 * exp(coef(m2)[2] - 1)
## profits
## 36.80982
\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]
Unit Change: A 1 percent increase in \(x_1\) is associated with a \(0.01 \beta_1\) unit increase in \(E(y)\).
\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]
m3 <- lm(salary ~ lprofits, ceosalary)
summary(m3)
##
## Call:
## lm(formula = salary ~ lprofits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1053.6 -304.3 -85.4 229.1 4380.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2725.89 583.52 -4.671 6.15e-06 ***
## lprofits 196.02 31.76 6.172 4.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 534.8 on 166 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.1866, Adjusted R-squared: 0.1817
## F-statistic: 38.09 on 1 and 166 DF, p-value: 4.995e-09
# unit change
0.01 * coef(m3)[2]
## lprofits
## 1.960153
\[ ln(y_i) = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]
Percent Change: 1 percent change in \(x_1\) is associated with a \(\beta_1\) percent change in \(E(y)\).
m4 <- lm(lsalary ~ lprofits, ceosalary)
summary(m4)
##
## Call:
## lm(formula = lsalary ~ lprofits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.30188 -0.31267 0.00524 0.36064 1.93497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.40506 0.59034 15.932 < 2e-16 ***
## lprofits 0.22281 0.03213 6.934 8.66e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.541 on 166 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2246, Adjusted R-squared: 0.2199
## F-statistic: 48.08 on 1 and 166 DF, p-value: 8.657e-11
# percent change
coef(m4)[2]
## lprofits
## 0.2228063
\[ ln(salary_i) = \beta_0 + \beta_1 ln(profits_i) + u_i\] \[ H_0: \beta_1 = 0 \]
# list-wise delete
data <- na.omit(ceosalary[, c("lsalary", "lprofits")])
# construct X
X <- cbind(rep(1, nrow(data)), data[, "lprofits"])
# construct y
y <- data[, "lsalary"]
# df
df <- nrow(X) - ncol(X)
# OLS
B <- solve(t(X)%*%X) %*% t(X)%*%y
# residuals
res <- y - X%*%B
# estimate sig^2
sig2 <- sum(res^2) / df
# estimate var-cov matrix for B
V <- sig2 * solve(t(X)%*%X)
# ses
se <- sqrt(diag(V))
# t stats
t <- B / se
# p
p <- 2*pt(t, df, lower.tail = FALSE)
# confidence intervals
lci <- B - qt(0.975, df)*se
uci <- B + qt(0.975, df)*se
# table
knitr::kable(round(data.frame(beta=B, se=se, tstat=t, p=p, lci=lci, uci=uci, row.names = c("Intercept","Logged Profits")), 3))
beta | se | tstat | p | lci | uci | |
---|---|---|---|---|---|---|
Intercept | 9.405 | 0.590 | 15.932 | 0 | 8.240 | 10.571 |
Logged Profits | 0.223 | 0.032 | 6.934 | 0 | 0.159 | 0.286 |
How do we interpret the values in the table?
Compares the residual SS of the restricted and unrestricted models.
\[ F = \frac{SSR_r - SSR_{ur} / q}{SSR_{ur} / (N-K-1)} \]
Where q is the number of restrictions. The F-statistic measures the ratio of explained variance to unexplained variance. A high F-statistic suggests that the additional parameter(s) significantly improve the model.
# unrestricted
m5 <- lm(lsalary ~ lprofits + lsales, ceosalary)
# re-estimate restricted model on unrestricted data
m5_r <- lm(lsalary ~ lprofits, data = m5$model)
m5_r <- update(m5, . ~ . - lsales, data = m5$model)
# SSR, for restricted and unrestricted models
ssr_r <- sum(m5_r$residuals^2)
ssr_ur <- sum(m5$residuals^2)
# df unrestricted
df_ur <- m5$df.residual
# df difference, # restrictions
q <- m5_r$df.residual - m5$df.residual
# F stat
Fstat <- ((ssr_r - ssr_ur) / q) / (ssr_ur / df_ur)
# p value
pf(Fstat, q, df_ur, lower.tail=F)
## [1] 7.449373e-06
# compare to anova
anova(m5_r, m5)
## Analysis of Variance Table
##
## Model 1: lsalary ~ lprofits
## Model 2: lsalary ~ lprofits + lsales
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 166 48.594
## 2 165 43.012 1 5.5823 21.415 7.449e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1