Skip to content

Latest commit

 

History

History
 
 

week3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Day 1

  • Make sure to finish and commit your solution to last week's Citibike modeling problem, with a new Rmd file that loads the 2015 weather and trip data, makes predictions using your model, and summarizes the model's performance

Classification: Naive Bayes

  • Review the slides on classification
  • The (super) naive Bayes shell script from lecture
  • Complete this naive Bayes lab
  • Think about the complexity of naive Bayes:
    • Assume you're given D documents that contain words from a vocabulary of total size V, and that documents contain w words, on average. For instance, you might have D = 1,000 emails that are labeled spam or not spam, with a vocabulary of V = 100,000 possible words, where emails contains 100 words, on average.
    • What is the running time for estimating the parameters for version of naive Bayes described in the slides?
    • What are the space requirements?
    • What is the cost of making a prediction on a new document once you've estimated the paramters?
    • State all of your answers in terms of D, V, and w.

Computational Complexity

Day 2

Fitting linear models

  • See here for a table of complexity for model fitting and here for the gory details behind solving the normal equations and gradient descent
  • See this animation of gradient descent

Classification: Evaluation, logistic regression

Day 3

  • See code from class for plotting logit models (preview output here)
  • Review the slides on causality and randomized experiments
  • See the references in this post for more on the reproducibility crisis

Causality and Experiments

  • Review chapter 12 of Intro to Statistical Thinking (IST) and do question 12.1
  • Review chapter 13 of IST and do question 13.1. Answer the following two additional questions for this problem:
    • Make a plot of the distribution of outcomes (change) split by the treatment (active), similar to this plot
    • Estimate the effect size by calculating Cohen's d. Think about whether the effect seems practically meaningful.

Day 4

  • See the slides on causal inference from observational data and natural experiments
  • Do the homework on difference-in-differences

Day 5

  • See the slides on regression discontinuity designs and instrumental variables
  • Do the homework on regression discontinuity