Logs and Exponents

Inverse Functions

\[ln(e^x) = x\] \[e^{ln(x)} = x\] Note: the log() function computes natural logs by default. The exp() function computes natural exponential functions.

x <- 10
log(exp(x))

## [1] 10

exp(log(x))

## [1] 10

Helpful Rules

\[ exp(a + b) = e^{a + b} = e^ae^b \] \[ exp(a - b) = e^{a - b} = \frac{e^a}{e^b} \] \[ ln(a) + ln(b) = ln(ab)\] \[ ln(a) - ln(b) = ln(\frac{a}{b})\] \[\frac{d}{dx} ln(x) = \frac{1}{x}\]

a <- 10; b <- 7 

exp(a + b) == exp(a) * exp(b)

## [1] TRUE

round(exp(a - b)) == round(exp(a) / exp(b))

## [1] TRUE

log(a) + log(b) == log(a*b)

## [1] TRUE

round(log(a) - log(b)) == round(log(a/b))

## [1] TRUE

D(expression(log(x)), "x")

## 1/x

1-Unit Change

A 1-unit increase in ln(x) implies a non-constant effect in x. Constant increases in ln(x) are associated with order of magnitude increases in x.

log(10*x) - log(x)

## [1] 2.302585

log(100*x) - log(10*x)

## [1] 2.302585

log(1000*x) - log(100*x)

## [1] 2.302585

# log(a) - log(b) == log(a/b)
log((10*x)/x)

## [1] 2.302585

log(10)

## [1] 2.302585

Log Transformations

In this lab, we will return to the CEO salaries data set.

salary: 1990 compensation ($1000s)
profits: 1990 profits ($ millions)

ceosalary <- read.csv("data/ceosalary.csv")

Create a variable lsalary that takes the log of salary in dollars. Create a variable lprofits that takes the log of profits in dollars.

What type of data will be lost in this transformation?

ceosalary$lsalary <- log(ceosalary$salary * 1000)
ceosalary$lprofits <- log(ceosalary$profits * 1000000)

## Warning in log(ceosalary$profits * 1e+06): NaNs produced

Interpreting Logs

Level-Level Model

1 unit change in $x_1$ is associated with a $\beta_1$ unit change in y. Remember: units of x (millions) and units of y (thousands)!

m1 <- lm(salary ~ profits, ceosalary)
summary(m1)

## 
## Call:
## lm(formula = salary ~ profits, data = ceosalary)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -872.4 -319.7 -119.8  242.0 4484.0 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 746.9238    45.7979   16.31  < 2e-16 ***
## profits       0.5723     0.1009    5.67 5.81e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 541.6 on 175 degrees of freedom
## Multiple R-squared:  0.1552, Adjusted R-squared:  0.1504 
## F-statistic: 32.14 on 1 and 175 DF,  p-value: 5.805e-08

Log-Level Model

\[ ln(y_i) = \beta_0 + \beta_1 x_{1i} + u_i\]

Factor Change: 1 unit change in $x_1$ multiplies $E(y)$ by $e^{\beta_1}$. Percent Change: 1 unit change in $x_1$ is associated with a $100(e^{\beta_1(\Delta x_1)} - 1)$ percent change in $E(y)$.

m2 <- lm(lsalary ~ profits, ceosalary)
summary(m2)

## 
## Call:
## lm(formula = lsalary ~ profits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.08834 -0.36626  0.02351  0.39714  2.04523 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.337e+01  4.718e-02 283.343  < 2e-16 ***
## profits     5.944e-04  1.040e-04   5.717  4.6e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5579 on 175 degrees of freedom
## Multiple R-squared:  0.1574, Adjusted R-squared:  0.1526 
## F-statistic: 32.68 on 1 and 175 DF,  p-value: 4.597e-08

# factor change 
exp(coef(m2)[2])

##  profits 
## 1.000595

# percent change
100 * exp(coef(m2)[2] - 1)

##  profits 
## 36.80982

Level-Log Model

\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

Unit Change: A 1 percent increase in $x_1$ is associated with a $0.01 \beta_1$ unit increase in $E(y)$.

\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

m3 <- lm(salary ~ lprofits, ceosalary)
summary(m3)

## 
## Call:
## lm(formula = salary ~ lprofits, data = ceosalary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1053.6  -304.3   -85.4   229.1  4380.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2725.89     583.52  -4.671 6.15e-06 ***
## lprofits      196.02      31.76   6.172 4.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 534.8 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.1866, Adjusted R-squared:  0.1817 
## F-statistic: 38.09 on 1 and 166 DF,  p-value: 4.995e-09

# unit change 
0.01 * coef(m3)[2]

## lprofits 
## 1.960153

Log-Log Model

\[ ln(y_i) = \beta_0 + \beta_1 ln(x_{1i}) + u_i\]

Percent Change: 1 percent change in $x_1$ is associated with a $\beta_1$ percent change in $E(y)$.

m4 <- lm(lsalary ~ lprofits, ceosalary)
summary(m4)

## 
## Call:
## lm(formula = lsalary ~ lprofits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.30188 -0.31267  0.00524  0.36064  1.93497 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.40506    0.59034  15.932  < 2e-16 ***
## lprofits     0.22281    0.03213   6.934 8.66e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.541 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2246, Adjusted R-squared:  0.2199 
## F-statistic: 48.08 on 1 and 166 DF,  p-value: 8.657e-11

# percent change 
coef(m4)[2]

##  lprofits 
## 0.2228063

Hypothesis Testing

\[ ln(salary_i) = \beta_0 + \beta_1 ln(profits_i) + u_i\] \[ H_0: \beta_1 = 0 \]

# list-wise delete
data <- na.omit(ceosalary[, c("lsalary", "lprofits")])

# construct X
X <- cbind(rep(1, nrow(data)), data[, "lprofits"])

# construct y
y <- data[, "lsalary"]

# df 
df <- nrow(X) - ncol(X)

# OLS
B <- solve(t(X)%*%X) %*% t(X)%*%y

# residuals
res <- y - X%*%B

# estimate sig^2
sig2 <- sum(res^2) / df

# estimate var-cov matrix for B
V <- sig2 * solve(t(X)%*%X)

# ses
se <- sqrt(diag(V))

# t stats
t <- B / se

# p
p <- 2*pt(t, df, lower.tail = FALSE)

# confidence intervals 
lci <- B - qt(0.975, df)*se
uci <- B + qt(0.975, df)*se

# table
knitr::kable(round(data.frame(beta=B, se=se, tstat=t, p=p, lci=lci, uci=uci, row.names = c("Intercept","Logged Profits")), 3))

	beta	se	tstat	p	lci	uci
Intercept	9.405	0.590	15.932	0	8.240	10.571
Logged Profits	0.223	0.032	6.934	0	0.159	0.286

How do we interpret the values in the table?

F test

Compares the residual SS of the restricted and unrestricted models.

\[ F = \frac{SSR_r - SSR_{ur} / q}{SSR_{ur} / (N-K-1)} \]

Where q is the number of restrictions. The F-statistic measures the ratio of explained variance to unexplained variance. A high F-statistic suggests that the additional parameter(s) significantly improve the model.

# unrestricted
m5 <- lm(lsalary ~ lprofits + lsales, ceosalary)

# re-estimate restricted model on unrestricted data
m5_r <- lm(lsalary ~ lprofits, data = m5$model)
m5_r <- update(m5, . ~ . - lsales, data = m5$model)

# SSR, for restricted and unrestricted models
ssr_r <- sum(m5_r$residuals^2)
ssr_ur <- sum(m5$residuals^2)

# df unrestricted
df_ur <- m5$df.residual

# df difference, # restrictions
q <- m5_r$df.residual - m5$df.residual

# F stat
Fstat <- ((ssr_r - ssr_ur) / q) / (ssr_ur / df_ur)

# p value
pf(Fstat, q, df_ur, lower.tail=F)

## [1] 7.449373e-06

# compare to anova
anova(m5_r, m5)

## Analysis of Variance Table
## 
## Model 1: lsalary ~ lprofits
## Model 2: lsalary ~ lprofits + lsales
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    166 48.594                                  
## 2    165 43.012  1    5.5823 21.415 7.449e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Lab 5