Logs and Exponents

Inverse Functions

\[ln(e^x) = x\] \[e^{ln(x)} = x\] Note: the log() function computes natural logs by default. The exp() function computes natural exponential functions.

x <- 10
log(exp(x)) 
## [1] 10
exp(log(x))
## [1] 10

Helpful Rules

\[ exp(a + b) = e^{a + b} = e^ae^b \] \[ exp(a - b) = e^{a - b} = \frac{e^a}{e^b} \] \[ ln(a) + ln(b) = ln(ab)\] \[ ln(a) - ln(b) = ln(\frac{a}{b})\] \[\frac{d}{dx} ln(x) = \frac{1}{x}\]

a <- 10; b <- 7 

exp(a + b) == exp(a) * exp(b)
## [1] TRUE
round(exp(a - b)) == round(exp(a) / exp(b)) 
## [1] TRUE
log(a) + log(b) == log(a*b)
## [1] TRUE
round(log(a) - log(b)) == round(log(a/b))
## [1] TRUE
D(expression(log(x)), "x")
## 1/x

1-Unit Change

A 1-unit increase in ln(x) implies a non-constant effect in x. Constant increases in ln(x) are associated with order of magnitude increases in x.

log(10*x) - log(x) 
## [1] 2.302585
log(100*x) - log(10*x)
## [1] 2.302585
log(1000*x) - log(100*x)
## [1] 2.302585
# log(a) - log(b) == log(a/b)
log((10*x)/x)
## [1] 2.302585
log(10)
## [1] 2.302585

Log Transformations

In this lab, we will return to the CEO salaries data set.

  • salary: 1990 compensation ($1000s)

  • profits: 1990 profits ($ millions)

ceosalary <- read.csv("data/ceosalary.csv")

Create a variable lsalary that takes the log of salary in dollars. Create a variable lprofits that takes the log of profits in dollars.

What type of data will be lost in this transformation?

ceosalary$lsalary <- log(ceosalary$salary * 1000)
ceosalary$lprofits <- log(ceosalary$profits * 1000000)
## Warning in log(ceosalary$profits * 1e+06): NaNs produced

Interpreting Logs

Level-Level Model

1 unit change in \(x_1\) is associated with a \(\beta_1\) unit change in y. Remember: units of x (millions) and units of y (thousands)!

m1 <- lm(salary ~ profits, ceosalary)
summary(m1)
## 
## Call:
## lm(formula = salary ~ profits, data = ceosalary)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -872.4 -319.7 -119.8  242.0 4484.0 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 746.9238    45.7979   16.31  < 2e-16 ***
## profits       0.5723     0.1009    5.67 5.81e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 541.6 on 175 degrees of freedom
## Multiple R-squared:  0.1552, Adjusted R-squared:  0.1504 
## F-statistic: 32.14 on 1 and 175 DF,  p-value: 5.805e-08

Log-Level Model

\[ ln(y_i) = \beta_0 + \beta_1 x_{1i} + u_i\]

Factor Change: 1 unit change in \(x_1\) multiplies \(E(y)\) by \(e^{\beta_1}\). Percent Change: 1 unit change in \(x_1\) is associated with a \(100(e^{\beta_1(\Delta x_1)} - 1)\) percent change in \(E(y)\).

m2 <- lm(lsalary ~ profits, ceosalary)
summary(m2)
## 
## Call:
## lm(formula = lsalary ~ profits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.08834 -0.36626  0.02351  0.39714  2.04523 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.337e+01  4.718e-02 283.343  < 2e-16 ***
## profits     5.944e-04  1.040e-04   5.717  4.6e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5579 on 175 degrees of freedom
## Multiple R-squared:  0.1574, Adjusted R-squared:  0.1526 
## F-statistic: 32.68 on 1 and 175 DF,  p-value: 4.597e-08
# factor change 
exp(coef(m2)[2])
##  profits 
## 1.000595
# percent change
100 * exp(coef(m2)[2] - 1)
##  profits 
## 36.80982

Level-Log Model

\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

Unit Change: A 1 percent increase in \(x_1\) is associated with a \(0.01 \beta_1\) unit increase in \(E(y)\).

\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

m3 <- lm(salary ~ lprofits, ceosalary)
summary(m3)
## 
## Call:
## lm(formula = salary ~ lprofits, data = ceosalary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1053.6  -304.3   -85.4   229.1  4380.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2725.89     583.52  -4.671 6.15e-06 ***
## lprofits      196.02      31.76   6.172 4.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 534.8 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.1866, Adjusted R-squared:  0.1817 
## F-statistic: 38.09 on 1 and 166 DF,  p-value: 4.995e-09
# unit change 
0.01 * coef(m3)[2]
## lprofits 
## 1.960153

Log-Log Model

\[ ln(y_i) = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

Percent Change: 1 percent change in \(x_1\) is associated with a \(\beta_1\) percent change in \(E(y)\).

m4 <- lm(lsalary ~ lprofits, ceosalary)
summary(m4)
## 
## Call:
## lm(formula = lsalary ~ lprofits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.30188 -0.31267  0.00524  0.36064  1.93497 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.40506    0.59034  15.932  < 2e-16 ***
## lprofits     0.22281    0.03213   6.934 8.66e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.541 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2246, Adjusted R-squared:  0.2199 
## F-statistic: 48.08 on 1 and 166 DF,  p-value: 8.657e-11
# percent change 
coef(m4)[2]
##  lprofits 
## 0.2228063

Hypothesis Testing

\[ ln(salary_i) = \beta_0 + \beta_1 ln(profits_i) + u_i\] \[ H_0: \beta_1 = 0 \]

# list-wise delete
data <- na.omit(ceosalary[, c("lsalary", "lprofits")])

# construct X
X <- cbind(rep(1, nrow(data)), data[, "lprofits"])

# construct y
y <- data[, "lsalary"]

# df 
df <- nrow(X) - ncol(X)

# OLS
B <- solve(t(X)%*%X) %*% t(X)%*%y

# residuals
res <- y - X%*%B

# estimate sig^2
sig2 <- sum(res^2) / df

# estimate var-cov matrix for B
V <- sig2 * solve(t(X)%*%X)

# ses
se <- sqrt(diag(V))

# t stats
t <- B / se

# p
p <- 2*pt(t, df, lower.tail = FALSE)

# confidence intervals 
lci <- B - qt(0.975, df)*se
uci <- B + qt(0.975, df)*se

# table
knitr::kable(round(data.frame(beta=B, se=se, tstat=t, p=p, lci=lci, uci=uci, row.names = c("Intercept","Logged Profits")), 3))
beta se tstat p lci uci
Intercept 9.405 0.590 15.932 0 8.240 10.571
Logged Profits 0.223 0.032 6.934 0 0.159 0.286

How do we interpret the values in the table?

F test

Compares the residual SS of the restricted and unrestricted models.

\[ F = \frac{SSR_r - SSR_{ur} / q}{SSR_{ur} / (N-K-1)} \]

Where q is the number of restrictions. The F-statistic measures the ratio of explained variance to unexplained variance. A high F-statistic suggests that the additional parameter(s) significantly improve the model.

# unrestricted
m5 <- lm(lsalary ~ lprofits + lsales, ceosalary)

# re-estimate restricted model on unrestricted data
m5_r <- lm(lsalary ~ lprofits, data = m5$model)
m5_r <- update(m5, . ~ . - lsales, data = m5$model)

# SSR, for restricted and unrestricted models
ssr_r <- sum(m5_r$residuals^2)
ssr_ur <- sum(m5$residuals^2)

# df unrestricted
df_ur <- m5$df.residual

# df difference, # restrictions
q <- m5_r$df.residual - m5$df.residual

# F stat
Fstat <- ((ssr_r - ssr_ur) / q) / (ssr_ur / df_ur)

# p value
pf(Fstat, q, df_ur, lower.tail=F)
## [1] 7.449373e-06
# compare to anova
anova(m5_r, m5)
## Analysis of Variance Table
## 
## Model 1: lsalary ~ lprofits
## Model 2: lsalary ~ lprofits + lsales
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    166 48.594                                  
## 2    165 43.012  1    5.5823 21.415 7.449e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1