In this lab, we will return to the CEO salaries data set.
salary
: 1990 compensation ($1000s)lsalary
: 1990 compensation ($1000s) loggedprofits
: 1990 profits ($ millions)lprofits
: 1990 profits ($ millions) loggedceosalary <- read.csv("data/ceosalary.csv")
ceosalary$lsalary <- log(ceosalary$salary)
ceosalary$lprofits <- log(ceosalary$profits)
## Warning in log(ceosalary$profits): NaNs produced
\[ ln(y_i) = \beta_0 + \beta_1 x_{1i} + u_i\] Precise: 1 unit change in \(x_1\) is associated with a \(100(e^{\beta_1\Delta x_1} - 1)\) percent change in \(E(y)\).
Approx: 1 unit change in \(x_1\) is associated with a \(100\beta_1\) percent change in \(E(y)\).
m2 <- lm(lsalary ~ profits, ceosalary)
summary(m2)
##
## Call:
## lm(formula = lsalary ~ profits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.08834 -0.36626 0.02351 0.39714 2.04523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.4593064 0.0471763 136.918 < 2e-16 ***
## profits 0.0005944 0.0001040 5.717 4.6e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5579 on 175 degrees of freedom
## Multiple R-squared: 0.1574, Adjusted R-squared: 0.1526
## F-statistic: 32.68 on 1 and 175 DF, p-value: 4.597e-08
# precise
100 * (exp(coef(m2)[2]) - 1)
## profits
## 0.05946088
# approx
100 * coef(m2)[2]
## profits
## 0.05944321
\[ y_i = \beta_0 + \beta_1 ln(x_{1i}) + u_i\] Precise: p percent change in \(x_1\) is associated with a \(\beta_1 ln(\frac{100 + p}{100})\) unit change in \(E(y)\).
Approx: 1 percent change in \(x_1\) is associated with a \(\frac{\beta_1}{100}\) unit change in \(E(y)\).
m3 <- lm(salary ~ lprofits, ceosalary)
summary(m3)
##
## Call:
## lm(formula = salary ~ lprofits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1053.6 -304.3 -85.4 229.1 4380.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.84 149.10 -0.120 0.905
## lprofits 196.02 31.76 6.172 4.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 534.8 on 166 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.1866, Adjusted R-squared: 0.1817
## F-statistic: 38.09 on 1 and 166 DF, p-value: 4.995e-09
# precise
coef(m3)[2] * log(1.01)
## lprofits
## 1.950417
# approximation
coef(m3)[2] / 100
## lprofits
## 1.960153
\[ ln(y_i) = \beta_0 + \beta_1 ln(x_{1i}) + u_i\] Precise: p percent change in \(x_1\) is associated with a \(100(e^{\beta_1 ln(\frac{100+p}{100})} - 1)\) percent change in \(E(y)\).
Approx: 1 percent change in \(x_1\) is associated with a \(\beta_1\) percent change in \(E(y)\).
m4 <- lm(lsalary ~ lprofits, ceosalary)
summary(m4)
##
## Call:
## lm(formula = lsalary ~ lprofits, data = ceosalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.30188 -0.31267 0.00524 0.36064 1.93497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.57549 0.15085 36.961 < 2e-16 ***
## lprofits 0.22281 0.03213 6.934 8.66e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.541 on 166 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2246, Adjusted R-squared: 0.2199
## F-statistic: 48.08 on 1 and 166 DF, p-value: 8.657e-11
# precise
100*(exp(coef(m4)[2]*log(101/100)) - 1)
## lprofits
## 0.2219456
# approx
coef(m4)[2]
## lprofits
## 0.2228063
When the effect is polynomial or there is an interaction with another independent variable, the marginal effect is conditional.
To find the conditional marginal effect, we take the first partial derivative with respect to \(x_1\).
Let’s try a simple example with a quadratic using the
trees
data.
\[volume_i = \beta_0 + \beta_1height_i + \beta_2(height_i)^2 + u_i\] Conditional Marginal Effect of Height on Volume: \[ \frac{\partial volume_i}{\partial height_i} = \beta_1 + 2\beta_2height_i \]
# load data
trees <- datasets::trees
# estimate quadratic model
m <- lm(Volume ~ Height + I(Height^2), data=trees)
# effect = B1 + 2B2(height)
# range of height
height <- seq(min(trees$Height), max(trees$Height), 0.1)
# calculate conditional marginal effect
cme_height <- coef(m)["Height"] + 2 * height * coef(m)["I(Height^2)"]
# plot cme
plot(height, cme_height,
main = "Conditional Marginal Effect of Height Across Values of Height",
ylab="CME of Height", xlab="Height", type="l",
xlim=c(60,90), ylim=c(-2, 4))
abline(h=0, lty=3)
Predicted Volume by Height:
\[volume_i = \beta_0 + \beta_1height_i + \beta_2(height_i)^2 + u_i\]
height <- data.frame(Height = height)
volume <- predict(m, newdata = height)
plot(height$Height, volume,
main = "Predicted Volume by Height",
ylab="Volume", xlab="Height", type="l",
xlim=c(60,90), ylim=c(10, 60))
\[ volume_i = \beta_0 + \beta_1 girth_i + \beta_2 height_i + \beta_3 girth_i * height_i \]
Predicted Volume by Height at Different Girths:
# estimate interaction model
mi <- lm(Volume ~ Height*Girth, data=trees)
# range of height values
height <- seq(min(trees$Height), max(trees$Height), 0.1)
# girth values
girth <- c(10, 15, 20)
new_data <- expand.grid(Height = height, Girth = girth)
#predict volume based on height and girth in new dataset
predicted_volume <- predict(mi, newdata = new_data)
#combine new dataset with predicted volume
predicted_df <- cbind(new_data, Volume = predicted_volume)
# install.packages("ggplot2")
library(ggplot2)
# ggplot
ggplot(predicted_df, aes(x = Height, y = Volume, color = factor(Girth))) +
geom_line() +
scale_color_manual(values = c("blue", "red", "green"),
labels = c(paste("Girth = 10"),
paste("Girth = 15"),
paste("Girth = 20")),
name = "Tree Girth") +
labs(x = "Tree Height", y = "Predicted Tree Volume",
title = "Predicted Volume vs. Height at Different Girth") +
theme_minimal()