Skip to content

Latest commit

 

History

History
196 lines (148 loc) · 16.6 KB

syllabus.md

File metadata and controls

196 lines (148 loc) · 16.6 KB

DS002 Intro to Data Science

updated 1/26

Class DS002: Intro to Data Science
Instructor Douglas Goodwin
Visiting Assistant Professor in Media Studies, Scripps College
Lang 228
Contact [email protected]
Class hours MW 11:00-12:15
Office Hours MW 1:00-3:00pm
Discord https://discord.gg/gtMVkAMV
Textbook Data Science from Scratch, 2nd Edition
Joel Grus.
ISBN: 9781492041139
Classroom Zoom, Steele 229
Description This course is the second part of a two-semester introduction to computer programming and data science. Students will explore, using Python and other tools the nuances of gathering, visualizing and analyzing data to gain insight and intuition with data. Students will be introduced to various data manipulation/analysis, and statistical methods by building their own code from scratch. They will also consider the ethical implications and limitations of creating models of data.

Course Goals

Students completing this course will be able to:

  • Define and explain key concepts in the data science
  • Learn how to analyze real-world data.
  • Gain fluency in basic programming skills in Python with a focus on statistical modeling and machine learning.
  • Develop and use essential python programming skills in data analysis, data visualization, and machine learning

We begin by reviewing Python and establishing repositories on GitHub. After review and setup you will read two chapters from "Data Science from Scratch" (DSfS) each week and reproduce the code in the book to build your own code libraries. You are encouraged to write your own code (and tests) as long as you use the same function names. Your code won't b e efficient, but it will work and your APIs will match dedicated libraries such as NumPy (Numerical Python), pandas (Python Data Analysis Library), and scikit-learn (machine learning in Python).

Please complete the readings before and start your code before class on Monday. I will lecture on the chapters then we will go through your code together during Monday class.

Wednesdays are lab days: work alone or in small groups to use your code to complete the lab assignments.

Points

Activity Points
10 Weekly Assignments 30 points
1 Presentation 20 points
10 in-class exercises 30 points
Project 30 points
TOTAL 110 points

I give you a little extra room in case you miss an exercise or two.

Points and letter grades

This class will be easy if you keep up with the readings and come to class.

Point range Grade
90-110 A
80-90 B
70-80 C
60-70 D
< 60 ...

###Weekly assignments

You will implement the code in the chapters from Data Science from Scratch (DSfS) to build a library of code to use in Deepnote. Push your assignments to your GitHub repository by Monday morning before class. We will use import your code and use it to complete in-class exercises with Deepnote.

Please DON'T copy and paste code! Typing it will help you get familiar with and synthesize the code.

Note the liberal use of the Python's assert statement in the sources files. A clean import will give you some assurance that your code is usable. There are more involved ways to test code, even a style of programming called TDD (Test-Driven Development). assert statements sprinkled throughout your source code give you TDD-lite!

In-class exercises

Use your code library to solve Data Science problems related to each week's theme. I am creating these exercises now--please ask if you have an idea for an exercise!

Presentation

Sign up for a presentation here: https://forms.gle/AyEUcQ5yJRXJWvCe6

Use this form to select a date and topic to make a 5-10 minute presentation to the class. You may work alone or in teams of less than 4.

We cover two chapters each week, but you only need to choose one topic. Example: on week 02/27 you may choose either Statistics or Probability.

Each chapter contains "Crows" : pointers to topics that are relevant to the subject but not covered in the chapter. These are excellent presentation prompts. Each chapter also concludes with a section called "For Further Exploration" and this is another good jumping off point.

Example: on 02/27 you might tell us why we will want to use SciPy's statistical functions instead of writing our own. What advantages does SciPy offer: speed? convenience? interoperability? all three? Then you might show examples in a Deepnote notebook.

Projects

Complete a small, real-world project in the final week of the class. You can start with one of the in-class exercises or dream up a project of your own. Projects should use external data and be executed in Deepnote or on Google Colab.

Accessibility

If you have a documented disability (physical or cognitive) that may impair your ability to complete assignments or otherwise participate in the course and satisfy course criteria, please meet with us at your earliest convenience to identify, discuss, and document any feasible instructional modifications or accommodations. You should also contact the Accessible Education Office to request an official letter outlining authorized accommodations.

Credits

I will lean heavily on the content in DSfS. Some of the material in this course is based on other classes. We have also heavily drawn on materials and examples found online and tried our best to give credit by linking to the original source. Please contact us if you find materials where the credit is missing or that you would rather have removed.

Presentations

week Baker, Devon Bhandari, Sumedha Crites, Nick Garel, Ellis Cuong Gelli, Medha Huchley, Amelia Jarquin, Emanuel Jung, Geeyoun Khowaja, Perbhaat Masliy, Luba Mujal, Elena Oh, Soo-Min Angela Pakenas, William P. Ruth, Wyatt Shi, Alice Smith, Kaia Montana Soltani, Kian B. Soriano Martinez, Angeles Ivett Wilson, Sophia Elizabeth Xu, Zimeng Zhou, Siva
02/07 Visualizing Data / Linear Algebra XX XX XX Visualizing Data
02/14 Statistics / Probability Statistics Statistics XX
02/21 Hypothesis & Inference / Gradient Descent
02/28 Getting Data / Working with Data Getting Data Getting Data Working with Data Getting Data
03/07 Machine Learning / K-Nearest Neighbors X
03/21 Naive Bayes / Simple Linear Regression Simple Linear Regression
03/28 Multiple Regression / Logistic Regression
04/04 Decision Trees / Neural Networks
04/11 Deep Learning / Clustering XX
04/18 Neural Language Processing / Network Analysis XX
04/25 Recommender Systems / Databases & SQL Recommender systems XX

Weeks, updated 1/26

  1. W01 01/17

    1. NO CLASS
    2. CH01 Introduction
  2. W02 01/24

    1. CH01 Introduction
    2. CH01+ Introduction: GitHub and Deepnote
      1. Class Exercise: Use your GitHub code in Deepnote to visualize the class composition by school
  3. W03 01/31

    1. CH02 A Crash Course in Python, pt1
    2. CH02 A Crash Course in Python, pt2
      1. Class Exercise: Use your GitHub code in Deepnote
  4. W04 02/07

    1. Python Assignment DUE BEFORE CLASS
    2. CH03 Visualizing Data
      1. Video: Data Viz, Computerphile
    3. CH04 Linear Algebra
      1. Exercise
  5. W05 02/14

    1. Viz Assignment DUE BEFORE CLASS
    2. CH05 Statistics
    3. CH06 Probability
      1. Coinflip Class Exercise, Monty Hall Problem
  6. W06 02/21

    1. Pandas Assignment DUE BEFORE CLASS
    2. CH07 Hypothesis and Inference
    3. CH08 Gradient Descent
      1. Class Exercise
  7. W07 02/28

    1. Assignment DUE BEFORE CLASS
    2. CH09 Getting Data
    3. CH10 Working With Data
      1. Class Exercise
  8. W08 03/07

    1. Getting data Assignment DUE BEFORE CLASS
    2. CH11 Machine Learning
    3. CH12 k-Nearest Neighbors
      1. Class Exercise
  9. W09 03/14 SPRING BREAK

  10. W10 03/21

  11. KNN Assignment DUE BEFORE CLASS

  12. CH13 Naive Bayes

    1. Video: Bayes Theorem
  13. CH14 Simple Linear Regression

    1. Video: Simple Linear Regression Formula
    2. Class Exercise
  14. W11 03/28

    1. Lineasr Regression Assignment DUE BEFORE CLASS
    2. CH15 Multiple Regression
      1. Video: Multiple Regression
    3. CH16 Logistic Regression
      1. Video: Data Regression
      2. Class Exercise
  15. W12 04/04

    1. Regression Assignment DUE BEFORE CLASS
    2. CH17 Decision Trees
    3. CH18 Neural Networks
      1. Class Exercise
  16. W13 04/11

    1. Neural Networks Assignment DUE BEFORE CLASS
    2. CH19 [Deep Learning]
    3. CH20 Clustering
      1. Class Exercise
  17. W14 04/18

    1. Clustering Assignment DUE BEFORE CLASS
    2. CH21 Natural Language Processing
    3. CH22 Network Analysis
      1. Class Exercise
  18. W15 04/25

    1. NLP Assignment DUE BEFORE CLASS
    2. CH23 Recommender Systems
    3. CH24 Databases and SQL
      1. Class Exercise
  19. W16 05/02

    1. PROJECTS

    2. PROJECTS

      1. Share on Discord