Log Interpretation

In this lab, we will return to the CEO salaries data set.

  • salary: 1990 compensation ($1000s)
  • lsalary: 1990 compensation ($1000s) logged
  • profits: 1990 profits ($ millions)
  • lprofits: 1990 profits ($ millions) logged
ceosalary <- read.csv("data/ceosalary.csv")

ceosalary$lsalary <- log(ceosalary$salary)
ceosalary$lprofits <- log(ceosalary$profits)
## Warning in log(ceosalary$profits): NaNs produced

Log-Level Model

\[ ln(y_i) = \beta_0 + \beta_1 x_{1i} + u_i\] Precise: 1 unit change in \(x_1\) is associated with a \(100(e^{\beta_1\Delta x_1} - 1)\) percent change in \(E(y)\).

Approx: 1 unit change in \(x_1\) is associated with a \(100\beta_1\) percent change in \(E(y)\).

m2 <- lm(lsalary ~ profits, ceosalary)
summary(m2)
## 
## Call:
## lm(formula = lsalary ~ profits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.08834 -0.36626  0.02351  0.39714  2.04523 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.4593064  0.0471763 136.918  < 2e-16 ***
## profits     0.0005944  0.0001040   5.717  4.6e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5579 on 175 degrees of freedom
## Multiple R-squared:  0.1574, Adjusted R-squared:  0.1526 
## F-statistic: 32.68 on 1 and 175 DF,  p-value: 4.597e-08
# precise
100 * (exp(coef(m2)[2]) - 1)
##    profits 
## 0.05946088
# approx
100 * coef(m2)[2] 
##    profits 
## 0.05944321

Level-Log Model

\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\] Precise: p percent change in \(x_1\) is associated with a \(\beta_1 ln(\frac{100 + p}{100})\) unit change in \(E(y)\).

Approx: 1 percent change in \(x_1\) is associated with a \(\frac{\beta_1}{100}\) unit change in \(E(y)\).

m3 <- lm(salary ~ lprofits, ceosalary)
summary(m3)
## 
## Call:
## lm(formula = salary ~ lprofits, data = ceosalary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1053.6  -304.3   -85.4   229.1  4380.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -17.84     149.10  -0.120    0.905    
## lprofits      196.02      31.76   6.172 4.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 534.8 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.1866, Adjusted R-squared:  0.1817 
## F-statistic: 38.09 on 1 and 166 DF,  p-value: 4.995e-09
# precise 
coef(m3)[2] * log(1.01)
## lprofits 
## 1.950417
# approximation
coef(m3)[2] / 100
## lprofits 
## 1.960153

Log-Log Model

\[ ln(y_i) = \beta_0 + \beta_1 ln(x_{1i}) + u_i\] Precise: p percent change in \(x_1\) is associated with a \(100(e^{\beta_1 ln(\frac{100+p}{100})} - 1)\) percent change in \(E(y)\).

Approx: 1 percent change in \(x_1\) is associated with a \(\beta_1\) percent change in \(E(y)\).

m4 <- lm(lsalary ~ lprofits, ceosalary)
summary(m4)
## 
## Call:
## lm(formula = lsalary ~ lprofits, data = ceosalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.30188 -0.31267  0.00524  0.36064  1.93497 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.57549    0.15085  36.961  < 2e-16 ***
## lprofits     0.22281    0.03213   6.934 8.66e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.541 on 166 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2246, Adjusted R-squared:  0.2199 
## F-statistic: 48.08 on 1 and 166 DF,  p-value: 8.657e-11
# precise 
100*(exp(coef(m4)[2]*log(101/100)) - 1)
##  lprofits 
## 0.2219456
# approx 
coef(m4)[2]
##  lprofits 
## 0.2228063

Conditional Marginal Effects

When the effect is polynomial or there is an interaction with another independent variable, the marginal effect is conditional.

To find the conditional marginal effect, we take the first partial derivative with respect to \(x_1\).

Quadratics

Let’s try a simple example with a quadratic using the trees data.

\[volume_i = \beta_0 + \beta_1height_i + \beta_2(height_i)^2 + u_i\] Conditional Marginal Effect of Height on Volume: \[ \frac{\partial volume_i}{\partial height_i} = \beta_1 + 2\beta_2height_i \]

# load data
trees <- datasets::trees

# estimate quadratic model
m <- lm(Volume ~ Height + I(Height^2), data=trees)

# effect = B1 + 2B2(height) 

# range of height
height <- seq(min(trees$Height), max(trees$Height), 0.1)

# calculate conditional marginal effect
cme_height <- coef(m)["Height"] +  2 * height * coef(m)["I(Height^2)"]

# plot cme
plot(height, cme_height, 
     main = "Conditional Marginal Effect of Height Across Values of Height",
     ylab="CME of Height", xlab="Height", type="l",
     xlim=c(60,90), ylim=c(-2, 4))
abline(h=0, lty=3)

Predicted Volume by Height:

\[volume_i = \beta_0 + \beta_1height_i + \beta_2(height_i)^2 + u_i\]

height <- data.frame(Height = height)
volume <- predict(m, newdata = height)

plot(height$Height, volume, 
     main = "Predicted Volume by Height",
     ylab="Volume", xlab="Height", type="l",
     xlim=c(60,90), ylim=c(10, 60))

Interactions

\[ volume_i = \beta_0 + \beta_1 girth_i + \beta_2 height_i + \beta_3 girth_i * height_i \]

Predicted Volume by Height at Different Girths:

  • This is the same code you’ve been practicing since PS1.
  • Will the lines be parallel?
# estimate interaction model
mi <- lm(Volume ~ Height*Girth, data=trees)

# range of height values 
height <- seq(min(trees$Height), max(trees$Height), 0.1)

# girth values 
girth <- c(10, 15, 20)

new_data <- expand.grid(Height = height, Girth = girth)

#predict volume based on height and girth in new dataset 
predicted_volume <- predict(mi, newdata = new_data)

#combine new dataset with predicted volume 
predicted_df <- cbind(new_data, Volume = predicted_volume)

# install.packages("ggplot2")
library(ggplot2)

# ggplot 
ggplot(predicted_df, aes(x = Height, y = Volume, color = factor(Girth))) +
  geom_line() +
  scale_color_manual(values = c("blue", "red", "green"),
                     labels = c(paste("Girth = 10"), 
                                paste("Girth = 15"), 
                                paste("Girth = 20")),
                     name = "Tree Girth") +
  labs(x = "Tree Height", y = "Predicted Tree Volume",
       title = "Predicted Volume vs. Height at Different Girth") +
  theme_minimal()