Introduction
POLSCI 630: Probability and Basic Regression
January 9, 2025
Welcome!
- Time: T/R 10:05-11:20
- Location: Old Chemistry 201
- Instructor: Chris Johnston
- Email: cdj19@duke.edu
- Office: Gross Hall 294J
- Office hours: by appt (send me an email to schedule)
- Lab: F 1:40-2:55
- Location: Physics 235
- TA: Stephanie Wright
- TA email: saw119@duke.edu
- Office: Cubicle 36
- Office hours: Tuesday 12:00-1:00
Goals
This course strives to achieve four goals:
Literate in regression analysis
Establish foundational knowledge that will help you move forward with methods training
Experience working with political science data and problems
Develop programming skills in widely used software
Topics
- Linear regression model and OLS estimation
- Assumptions and properties of OLS
- Making inferences about parameters
- Dealing with assumption violations
- The bias-variance tradeoff
- Time series and panel data
- Categorical data
Prerequisites
We will assume working knowledge at the level of POLSCI 609 (Siegel’s Fall semester intro course)
- Basic calculus (derivatives and integrals)
- Basic linear algebra (working with vectors and matrices)
- Basic probability theory (random variables, distributions and their moments)
- Basic inferential statistics (standard errors, confidence intervals)
- Basic programming in R for the above
Notation
I think many people struggle because they get lost in notation
- Once notation is second nature, you can read and digest new material quickly
- I will try to be clear with notation (let me know when things are not!)
- It is also your responsibility to spend time with notation to understand it
\[
\hat{\boldsymbol{\beta}}=(\textbf{X}'\textbf{X})^{-1}\textbf{X}'\boldsymbol{y}
\]
Navigating the course
Your responsibilities
- Attendance in lecture and lab
- Attentiveness in lecture and lab
- Complete reading prior to lecture
- Take problem sets seriously and complete on-time
- Get help early and often, as needed
Quizzes
Almost every Thursday you will take a ~5-minute quiz on that week’s required reading (and possibly material from Tuesday’s lecture, which will be on the same topics)
- Quizzes will begin promptly at 10:05 and you are expected to be on-time
- Questions will be multiple choice and shown up front on the screen
- You will respond using pre-printed Gradescope bubble sheets
- Quizzes will be short and straightforward if you have done the reading and attended lecture
- The lowest two quiz scores will be dropped when calculating the final quiz grade
Before next Thursday
- Print out (at least) 14 copies of the first page of Gradescope’s bubble sheet template (you will use 12 for quizzes and 2 for exams)
- Store these in a bag, notebook, etc. that you know you will bring to class
- Make sure you have a pen or pencil in a bag, notebook, etc. that you know you will bring to class
- Make a plan to be in class on-time, every time! There are no make-up quizzes
Submission
Submit two files:
- an Rmd (markdown) file with all of your code and answers
- an html file rendered from your Rmd file, which includes only what you wish to present
- Your lowest two problem set scores from the semester will be dropped when calculating the final grade.
Exams
There will be two, in-person exams.
- We will use one class period for the midterm and the scheduled final exam period for the final.
- Exams will focus primarily on theory and interpretation, rather than implementation in R.
- Remember to bring a Gradescope bubble sheet and pen or pencil to exams.
Collaboration
Some amount of student collaboration is expected and permitted - you are encouraged to form study groups
But be careful with problem sets - do not work on these together, directly
- Help should be given in a way that is abstracted away from the specific problems (work through other problems for practice)
- Sharing of ideas must not be one directional
- Absolutely no copying and pasting from one student to another
LLMs
Tricky for a course like this:
LLMs are simply off-limits for writing (e.g., interpreting regression output)
You may use them for help with coding tasks, when you are stuck, but not to write your code for you
- Even this can be detrimental - we are mostly dealing with pretty basic stuff, which you want to internalize until you are fluent
My recommendation
What about when you get stuck? I recommend the following (in order):
- Make sure you really tried to solve it using your existing knowledge - you will often learn a lot by just trying different ideas that you come up with
- Try to use R help files (
?
) for the relevant functions or packages
- Google your problem and read posts about it on venues like Stack Overflow
And, if you still haven’t solved it,
- Ask an LLM (But don’t just copy the code! Read it, understand it, and then try to write it yourself)
How to do well
Do the readings on time, every week
- Don’t skip “the math”; keep at it until you understand
Pay attention in lecture
Study with others, but write your problem sets by yourself
- Don’t use ChatGPT (perhaps in exceptional circumstances)
When you don’t understand something, get help early and often!!!