POLSCI 630: BASIC REGRESSION

Time: T/R 10:05-11:20
Location: Old Chemistry 201
Instructor: Chris Johnston
Email: cdj19@duke.edu
Office: Gross Hall 294J
Office hours: by appt (send me an email to schedule)

Lab: F 1:40-2:55
Location: Physics 235
TA: Stephanie Wright
TA email: saw119@duke.edu
Office: Cubicle 36
Office hours: Tuesday 12:00-1:00

SUMMARY

This course covers basic techniques in quantitative political analysis with a focus on linear regression. It introduces students to widely used procedures for regression analysis, and provides intuitive, applied, and formal foundations for linear regression as well as some extensions.

This course assumes the basic background knowledge of calculus, probability theory, linear algebra, and statistics that is covered in our Fall semester course POLSCI 609: Fundamentals of Research.

For statistical software, we will use R, and assume a working knowledge at the level of POLSCI 609: Fundamentals of Research. If you wish to learn alternative software, such as Stata, Python, or Julia, you may of course do so, but it must be in parallel. That is, you must complete all assignments in R. We may be able to provide some guidance with other software, but no guarantees.

This course strives to achieve four goals.

Students will become literate in regression analysis, one of the most widely used modeling approaches in the social sciences.
Students will establish a foundation in statistical theory and applied econometrics that will help them move forward with more advanced methods training.
Students will develop experience working with data on topics related to political science, in the context of in-class examples, lab practicums, take-home problem sets, and exams.
Students will develop their programming skills in widely used statistical software.

GRADING

Final grades are determined as follows:

98-100 A+
93-97 A
90-92 A-

88-90 B+
83-87 B
80-82 B-

78-80 C+
73-77 C
70-72 C-

68-70 D+
63-67 D
60-62 D-

<60 F

Attendance (10%)

The first part of learning is showing up.

You are expected to attend all lectures.
Lab is class! We will take lab attendance.

Quizzes (25%)

At the beginning of class every Thursday you will take a 5-minute quiz on that week’s required reading (and possibly material from Tuesday’s lecture, which will be on the same topics).

Quizzes will begin promptly at 10:05.
Questions will be multiple choice and shown up front on the screen.
You will respond using pre-printed Gradescope bubble sheets.
Quizzes will be short and straightforward if you have done the reading and attended lecture.
Your lowest two quiz scores will be dropped when calculating the final quiz grade for the semester.

Before your first quiz, do the following:

Print out (at least) 14 copies of the first page of Gradescope’s bubble sheet template, and store them in a bag, notebook, etc. that you know you will bring to class. You will use 12 for quizzes and 2 for exams.
Make sure you have a pen or pencil in a bag, notebook, etc. that you know you will bring to class.
Make a plan to be in class on-time, every time! There are no make-up quizzes.

Problem sets (25%)

On Friday in lab, students will receive a problem set to be completed over the course of the upcoming week.

The problem sets will ask students to demonstrate mastery of statistical theory, analyzing data, and reading and interpreting results.
Problem sets should be submitted electronically to the relevant assignment page on Canvas before the following week’s lab (i.e., Friday morning by 10:00). Grades will be deducted by 10 points (out of 100) for every day they are late.
We ask that you submit two files: 1) the Rmd (markdown) file with all of your code and answers, and 2) an html file rendered from your Rmd file. The html file should include only what you wish to present. The Rmd file should contain all of your work. Generally, you will not present your code in your html file (unless we ask you to), but rather formatted results (e.g., tables and figures) and text discussing those results and answers to the relevant questions.
Your lowest two problem set scores from the semester will be dropped when calculating the final grade.

Exams (20% each)

There will be two, in-person exams.

We will use one class period for the midterm and the scheduled final exam period for the final.
Exams will focus primarily on theory and interpretation, rather than implementation in R.
Remember to bring a Gradescope bubble sheet and pen or pencil to exams.

TEXTS

Please purchase the following textbook (if you want to use an earlier edition, make sure the chapters match up to the 7th edition):

Wooldridge, J. 2019. Introductory Econometrics: A Modern Approach, 7th Edition, Boston: Cengage.

We will also read 3 chapters from the following text, which are posted on Canvas under “Files”:

James, Gareth, et al. 2013. An Introduction to Statistical Learning with Applications in R. New York: Springer.

Several additional articles appear in their respective weeks in the schedule below, and are available as links, through Canvas, or through Duke Library.

SCHEDULE

Week 1 (1/9): Introduction

It would be a good idea to review the Wooldridge Appendices A-D before the start of the semester, which cover material from your Fall semester course

Week 2 (1/14, 1/16): The linear regression model 1

Wooldridge, Chapter 2

Week 3 (1/21, 1/23): The linear regression model 2

Wooldridge, Chapter 2

Week 4 (1/28, 1/30): Assumptions and properties

Wooldridge, Chapter 3

Week 5 (2/4, 2/6): Inference

Wooldridge, Chapter 4

Week 6 (2/11, 2/13): Asymptotics

Wooldridge, Chapter 5

Week 7 (2/18, 2/20): Functional form and interactions

Wooldridge, Chapter 6

Week 8 (2/25, 2/27): Advanced strategies for inference

Gelman & Hill, pp. 142-143
James et al., Section 5.2 (“The Bootstrap”)
Robinson, Chapter 7 (“Delta Method”)

Week 9 (3/4, 3/6): EXAM 1

Tuesday:
Thursday: MIDTERM EXAM TOPIC LIST

Week 10 (3/11, 3/13): SPRING BREAK

Week 11 (3/18, 3/20): Qualitative information

Wooldridge, Chapter 7 [sections 7.1-7.4 only]

Week 12 (3/25, 3/27): Heteroskedasticity

Wooldridge, Chapter 8

Week 13 (4/1, 4/3): Endogeneity and Instrumental Variables

Wooldridge, Chapters, 9 & 15

Week 14 (4/8, 4/10): Bias-variance trade-off and generalizability

James et al., Chapters 2, 5, 6

Week 15-16 (4/15, 4/17, 4/22): Panel data methods

Wooldridge, Chapters, 13 & 14

POLICIES

Excused absences

You are expected to attend class, including lab, prepared to engage with the material for that class. Excused absences may be requested in writing with reasonable advance notice (more urgent reasons require less advance notice). Duke policies outline personal emergencies, illnesses, varsity athletic competition, and religious observances as acceptable reasons for an excused absence, but we are willing to consider other reasons that do not neatly fall into one of these categories if given sufficient advance notice.

Deadlines and late work

This course moves on a regular, fast-paced schedule. Late work will be penalized (as outlined above) and extensions on assignments will be strongly discouraged in the absence of a compelling reason. This is to incentivize you to stay on schedule, as it will be difficult to catch up if you fall behind.

Honesty and professionalism

The Duke community standard is in effect throughout the semester. By taking this course, you affirm that it is a violation of the code to cheat on assignments, to plagiarize, to deviate from the teacher’s instructions about collaboration on work that is submitted for grades, to give false information to a faculty member, and to undertake any other form of academic misconduct. You also affirm that if you witness others violating the code you have a duty to report them.

Beyond the Duke community standard, we expect you to adhere to and maintain norms of professionalism throughout the course. This includes providing us with reasonable advance notice if you need to miss class or move a deadline (better reasons require less advance notice), maintaining a collegial learning environment with your colleagues, both inside and outside of the classroom, and taking pride in your work.

Collaboration and study groups

Given the nature of this course, some amount of student collaboration is expected and permitted. You are encouraged to form study groups, BUT you must complete problem sets on your own. That is, students may study and practice together, share helpful tips, or compare outputs with one another, but with the following stipulations:

The sharing of ideas must not be one directional, where one student is doing the work and the other is free riding.
The actual write-up of the work that is handed in must be the work of each individual, with absolutely no copying and pasting from one student to another. The difference between students learning from each other and one student representing another’s work as their own is usually quite obvious.

The best way to ensure you stay within these guidelines is to always complete the actual coding and writing of assignments by yourself. That is, you might study a topic as a group, but wait until you are alone to actually write code and interpretations. Group study should abstract away from the actual homework assignment problems.

For example, if you are trying to figure out how to calculate OLS estimates and standard errors by hand, we encourage you to study this in a group. But work together on different problems than the ones in your problem set, and do not just copy the practice code from your classmates - write it yourself, even when working on study problems.

This is also an excellent way to test whether you really understand things! It always feels like you understand while you are talking with others - the real test is when you have to do it by yourself.

Large language models (LLMs)

For programming/coding

LLMs, such as ChatGPT, are an incredible resource for data science. I (CDJ) use them regularly for various tasks, including coding. On one hand, I don’t want to keep you from using such a valuable resource. On the other hand, it can be dangerous for your long-term education to rely on them too heavily when you are first learning the basics.

I want to emphasize: this course is an introduction to econometrics. You need to internalize not only the theoretical concepts, but also the core concepts associated with programming. If you use LLMs as a crutch this semester, you are going to have a lot of trouble when you get to more advanced material. More generally, you will be in a precarious position, because you will not understand the code you are writing. This will also make you a much less valuable collaborator.

So what does this mean in practice? Simply put:

I will not be checking your code to try to figure out if you are using LLMs (if it is so obvious that I can’t ignore it, I will probably send you a note encouraging you to stop, but I won’t be checking in any systematic way, nor will I penalize you for it).
I strongly recommend you minimize your use of LLMs when coding.

What about when you get stuck and can’t solve the coding problem based on your existing knowledge? I recommend the following approach (in order):

Make sure you really tried to solve it using your existing knowledge - you will often learn a lot by just trying different ideas that you come up with.
Try to use R help files for the relevant functions or packages.
Google your problem and read posts about it on venues like Stack Overflow.

And, if you still haven’t solved it,

Ask an LLM (But don’t just copy the code! Read it, understand it, and then try to write it yourself).

What’s the difference between Stack Overflow and LLMs? In my experience, the temptation is much stronger (and often it is just easier) with LLMs to copy and paste code without actually understanding it. There are many, really basic, programming-related tools we are developing in this course, and you want these to be second nature, not something you have to look up every single time you need them.

For writing

LLMs are simply off-limits for writing in this course. Forcing yourself to write out interpretations of results on your own helps to reinforce your understanding. More practically, you need practice in how to write results sections in social scientific papers.

If it is obvious that you are using an LLM for writing, I will get in touch with you to let you know. If it continues, I will treat it as a case of academic dishonesty. There may be other venues where using LLMs for writing is OK, but the policy of this class is that they are off-limits.

N.B. LLMs have a very particular style of writing that is often easy to notice, especially when multiple people are using it for the same questions! And you should be aware that they may plagiarize material on the web in generating responses.