Formatting

  • Use hash tags to create headings (# Problem 1) and subheadings (## Part a)
  • echo = F: the output (e.g. a plot) will appear in the knitted pdf, but the code will not
  • echo = T: the output (e.g. a plot) and the code will appear in the knitted pdf
  • include = F: neither the code nor the output will appear in the knitted pdf, but the code will be evaluated
  • fig.width = 5, fig.height = 3: plots will appear in the knitted pdf as 5 inches wide and 3 inches tall

Data

In this lab, we will work with data on CEO salaries from 1990, from the dataset ceosalary.csv. (You may remember this dataset from yesterday’s lecture (slides 40 - 43), or last semester (PS 12R and Exam 3R))

The relevant variables are:

  • salary: 1990 compensation ($1000s)

  • age: Age (years)

  • profits: 1990 profits ($ millions)

Bivariate Regression

\[salary_i = \beta_0 + \beta_1 profits_{1i} + u_i\] Find OLS estimates of \(\beta_0\) and \(\beta_1\) in the above model.

# variables 
y <- ceosalary$salary
x <- ceosalary$profits
X <- cbind(1, ceosalary$profits)

# bivariate formula
b1 <- sum((y - mean(y)) * (x - mean(x)))/sum(((x - mean(x))^2))
b0 <- mean(y) - b1 * mean(x)

b1 <- cov(ceosalary$salary, ceosalary$profits) / var(ceosalary$profits)
b0 <- mean(ceosalary$salary) - mean(ceosalary$profits)*b1

# matrix 
beta <- solve(t(X) %*% X) %*% t(X) %*% y 

# lm function
coef(lm(ceosalary$salary ~ ceosalary$profits))
##       (Intercept) ceosalary$profits 
##       729.0172855         0.6017118

How much is a CEO who made 0 in firm profits predicted to make in salary? How much is a CEO who made 10-million in firm profits predicted to make in salary? If firm profits increase by 1-million, what is the predicted increase in salary?

Scatter Plot

Create a scatter plot depicting the relationship between salary and profits (with profits on the x-axis).

## starting httpd help server ... done

Multiple Regression

\[salary_i = \beta_0 + \beta_1 profits_{1i} + \beta_2 age_{2i} + u_i\] Find OLS estimates of \(\beta_0\), \(\beta_1\), and \(\beta_3\) in the above model.

# matrix 
X <- cbind(1, ceosalary$profits, ceosalary$age)
y <- ceosalary$salary
beta <- solve(t(X) %*% X) %*% t(X) %*% y

# function 
m1 <- lm(salary ~ profits + age, data=ceosalary)

Coefficient Table

Report the coefficient estimates, standard errors, t-statistics, and two-tailed p-values, in a professional looking table.

Regression Coefficient Estimates
Coefficient Standard_Error T_Statistic P_Value
(Intercept) 541.7472943 279.8241748 1.9360275 0.0545735
profits 0.5921174 0.1048957 5.6448199 0.0000001
age 3.3642933 4.9523160 0.6793374 0.4978760

Plot Predictions

Create a figure that plots the predicted value of salary as a function of profits at three different values for age: the 5th percentile, the 50th percentile, and 95th percentile. Use the predict function to do it.

Base R

ggplot

## Warning: package 'ggplot2' was built under R version 4.3.3

Model Fit

Calculate R squared, mean squared error, and adjusted R squared by hand.

\[R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{\Sigma_N (y_i - \hat{y_i})^2}{\Sigma_N (y_i - \bar{y})^2}\] \[MSE = \frac{\Sigma_N (y_i - \hat{y_i})^2}{N - K - 1} \] \[R_{adj}^2 = 1 - \frac{MSE}{Var(y)} = 1 - \frac{\frac{\Sigma_N (y_i - \hat{y_i})^2}{N - K - 1}}{\frac{\Sigma_N (y_i - \bar{y})^2}{N - 1}}\]

## [1] 0.1707458
## [1] 293359.6
## [1] 0.1606942