Introduction

POLSCI 630: Probability and Basic Regression

January 9, 2025

Welcome!


  • Time: T/R 10:05-11:20
  • Location: Old Chemistry 201
  • Instructor: Chris Johnston
  • Email: cdj19@duke.edu
  • Office: Gross Hall 294J
  • Office hours: by appt (send me an email to schedule)
  • Lab: F 1:40-2:55
  • Location: Physics 235
  • TA: Stephanie Wright
  • TA email: saw119@duke.edu
  • Office: Cubicle 36
  • Office hours: Tuesday 12:00-1:00

Goals

This course strives to achieve four goals:

  • Literate in regression analysis

  • Establish foundational knowledge that will help you move forward with methods training

  • Experience working with political science data and problems

  • Develop programming skills in widely used software

Topics

  1. Linear regression model and OLS estimation
  2. Assumptions and properties of OLS
  3. Making inferences about parameters
  4. Dealing with assumption violations
  5. The bias-variance tradeoff
  6. Time series and panel data
  7. Categorical data

Prerequisites

We will assume working knowledge at the level of POLSCI 609 (Siegel’s Fall semester intro course)

  1. Basic calculus (derivatives and integrals)
  2. Basic linear algebra (working with vectors and matrices)
  3. Basic probability theory (random variables, distributions and their moments)
  4. Basic inferential statistics (standard errors, confidence intervals)
  5. Basic programming in R for the above

Notation

I think many people struggle because they get lost in notation

  • Once notation is second nature, you can read and digest new material quickly
  • I will try to be clear with notation (let me know when things are not!)
  • It is also your responsibility to spend time with notation to understand it

\[ \hat{\boldsymbol{\beta}}=(\textbf{X}'\textbf{X})^{-1}\textbf{X}'\boldsymbol{y} \]

Your responsibilities


  • Attendance in lecture and lab
  • Attentiveness in lecture and lab
  • Complete reading prior to lecture
  • Take problem sets seriously and complete on-time
  • Get help early and often, as needed

Quizzes

Almost every Thursday you will take a ~5-minute quiz on that week’s required reading (and possibly material from Tuesday’s lecture, which will be on the same topics)

  • Quizzes will begin promptly at 10:05 and you are expected to be on-time
  • Questions will be multiple choice and shown up front on the screen
  • You will respond using pre-printed Gradescope bubble sheets
  • Quizzes will be short and straightforward if you have done the reading and attended lecture
  • The lowest two quiz scores will be dropped when calculating the final quiz grade

Before next Thursday


  • Print out (at least) 14 copies of the first page of Gradescope’s bubble sheet template (you will use 12 for quizzes and 2 for exams)
  • Store these in a bag, notebook, etc. that you know you will bring to class
  • Make sure you have a pen or pencil in a bag, notebook, etc. that you know you will bring to class
  • Make a plan to be in class on-time, every time! There are no make-up quizzes

Problem sets

Submission

  • Submit to the relevant assignment page on Canvas before the following week’s lab (i.e., Friday morning at 10:00)

    • Grades will be deducted by 10 points (out of 100) for every day they are late.
  • Submit two files:

    • an Rmd (markdown) file with all of your code and answers
    • an html file rendered from your Rmd file, which includes only what you wish to present
  • Your lowest two problem set scores from the semester will be dropped when calculating the final grade.

Exams

There will be two, in-person exams.

  • We will use one class period for the midterm and the scheduled final exam period for the final.
  • Exams will focus primarily on theory and interpretation, rather than implementation in R.
  • Remember to bring a Gradescope bubble sheet and pen or pencil to exams.

Collaboration

Some amount of student collaboration is expected and permitted - you are encouraged to form study groups

But be careful with problem sets - do not work on these together, directly

  • Help should be given in a way that is abstracted away from the specific problems (work through other problems for practice)
  • Sharing of ideas must not be one directional
  • Absolutely no copying and pasting from one student to another

LLMs

Tricky for a course like this:

  • LLMs are simply off-limits for writing (e.g., interpreting regression output)

  • You may use them for help with coding tasks, when you are stuck, but not to write your code for you

    • Even this can be detrimental - we are mostly dealing with pretty basic stuff, which you want to internalize until you are fluent

My recommendation

What about when you get stuck? I recommend the following (in order):

  1. Make sure you really tried to solve it using your existing knowledge - you will often learn a lot by just trying different ideas that you come up with
  2. Try to use R help files (?) for the relevant functions or packages
  3. Google your problem and read posts about it on venues like Stack Overflow

And, if you still haven’t solved it,

  1. Ask an LLM (But don’t just copy the code! Read it, understand it, and then try to write it yourself)

How to do well


  • Do the readings on time, every week

    • Don’t skip “the math”; keep at it until you understand
  • Pay attention in lecture

  • Study with others, but write your problem sets by yourself

    • Don’t use ChatGPT (perhaps in exceptional circumstances)
  • When you don’t understand something, get help early and often!!!