updated 1/26
Class | DS002: Intro to Data Science |
---|---|
Instructor | Douglas Goodwin Visiting Assistant Professor in Media Studies, Scripps College Lang 228 |
Contact | [email protected] |
Class hours | MW 11:00-12:15 |
Office Hours | MW 1:00-3:00pm |
Discord | https://discord.gg/gtMVkAMV |
Textbook | Data Science from Scratch, 2nd Edition Joel Grus. ISBN: 9781492041139 |
Classroom | Zoom, Steele 229 |
Description | This course is the second part of a two-semester introduction to computer programming and data science. Students will explore, using Python and other tools the nuances of gathering, visualizing and analyzing data to gain insight and intuition with data. Students will be introduced to various data manipulation/analysis, and statistical methods by building their own code from scratch. They will also consider the ethical implications and limitations of creating models of data. |
Students completing this course will be able to:
- Define and explain key concepts in the data science
- Learn how to analyze real-world data.
- Gain fluency in basic programming skills in Python with a focus on statistical modeling and machine learning.
- Develop and use essential python programming skills in data analysis, data visualization, and machine learning
We begin by reviewing Python and establishing repositories on GitHub. After review and setup you will read two chapters from "Data Science from Scratch" (DSfS) each week and reproduce the code in the book to build your own code libraries. You are encouraged to write your own code (and tests) as long as you use the same function names. Your code won't b e efficient, but it will work and your APIs will match dedicated libraries such as NumPy (Numerical Python), pandas (Python Data Analysis Library), and scikit-learn (machine learning in Python).
Please complete the readings before and start your code before class on Monday. I will lecture on the chapters then we will go through your code together during Monday class.
Wednesdays are lab days: work alone or in small groups to use your code to complete the lab assignments.
Activity | Points |
---|---|
10 Weekly Assignments | 30 points |
1 Presentation | 20 points |
10 in-class exercises | 30 points |
Project | 30 points |
TOTAL | 110 points |
I give you a little extra room in case you miss an exercise or two.
This class will be easy if you keep up with the readings and come to class.
Point range | Grade |
---|---|
90-110 | A |
80-90 | B |
70-80 | C |
60-70 | D |
< 60 | ... |
###Weekly assignments
You will implement the code in the chapters from Data Science from Scratch (DSfS) to build a library of code to use in Deepnote. Push your assignments to your GitHub repository by Monday morning before class. We will use import your code and use it to complete in-class exercises with Deepnote.
Please DON'T copy and paste code! Typing it will help you get familiar with and synthesize the code.
Note the liberal use of the Python's assert
statement in the sources files. A clean import will give you some assurance that your code is usable. There are more involved ways to test code, even a style of programming called TDD (Test-Driven Development). assert
statements sprinkled throughout your source code give you TDD-lite!
Use your code library to solve Data Science problems related to each week's theme. I am creating these exercises now--please ask if you have an idea for an exercise!
Sign up for a presentation here: https://forms.gle/AyEUcQ5yJRXJWvCe6
Use this form to select a date and topic to make a 5-10 minute presentation to the class. You may work alone or in teams of less than 4.
We cover two chapters each week, but you only need to choose one topic. Example: on week 02/27 you may choose either Statistics or Probability.
Each chapter contains "Crows" : pointers to topics that are relevant to the subject but not covered in the chapter. These are excellent presentation prompts. Each chapter also concludes with a section called "For Further Exploration" and this is another good jumping off point.
Example: on 02/27 you might tell us why we will want to use SciPy's statistical functions instead of writing our own. What advantages does SciPy offer: speed? convenience? interoperability? all three? Then you might show examples in a Deepnote notebook.
Complete a small, real-world project in the final week of the class. You can start with one of the in-class exercises or dream up a project of your own. Projects should use external data and be executed in Deepnote or on Google Colab.
If you have a documented disability (physical or cognitive) that may impair your ability to complete assignments or otherwise participate in the course and satisfy course criteria, please meet with us at your earliest convenience to identify, discuss, and document any feasible instructional modifications or accommodations. You should also contact the Accessible Education Office to request an official letter outlining authorized accommodations.
I will lean heavily on the content in DSfS. Some of the material in this course is based on other classes. We have also heavily drawn on materials and examples found online and tried our best to give credit by linking to the original source. Please contact us if you find materials where the credit is missing or that you would rather have removed.
week | Baker, Devon | Bhandari, Sumedha | Crites, Nick | Garel, Ellis Cuong | Gelli, Medha | Huchley, Amelia | Jarquin, Emanuel | Jung, Geeyoun | Khowaja, Perbhaat | Masliy, Luba | Mujal, Elena | Oh, Soo-Min Angela | Pakenas, William P. | Ruth, Wyatt | Shi, Alice | Smith, Kaia Montana | Soltani, Kian B. | Soriano Martinez, Angeles Ivett | Wilson, Sophia Elizabeth | Xu, Zimeng | Zhou, Siva |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
02/07 Visualizing Data / Linear Algebra | XX | XX | XX | Visualizing Data | |||||||||||||||||
02/14 Statistics / Probability | Statistics | Statistics | XX | ||||||||||||||||||
02/21 Hypothesis & Inference / Gradient Descent | |||||||||||||||||||||
02/28 Getting Data / Working with Data | Getting Data | Getting Data | Working with Data | Getting Data | |||||||||||||||||
03/07 Machine Learning / K-Nearest Neighbors | X | ||||||||||||||||||||
03/21 Naive Bayes / Simple Linear Regression | Simple Linear Regression | ||||||||||||||||||||
03/28 Multiple Regression / Logistic Regression | |||||||||||||||||||||
04/04 Decision Trees / Neural Networks | |||||||||||||||||||||
04/11 Deep Learning / Clustering | XX | ||||||||||||||||||||
04/18 Neural Language Processing / Network Analysis | XX | ||||||||||||||||||||
04/25 Recommender Systems / Databases & SQL | Recommender systems | XX |
-
W01 01/17
- NO CLASS
- CH01 Introduction
-
W02 01/24
- CH01 Introduction
- CH01+ Introduction: GitHub and Deepnote
- Class Exercise: Use your GitHub code in Deepnote to visualize the class composition by school
-
W03 01/31
- CH02 A Crash Course in Python, pt1
- CH02 A Crash Course in Python, pt2
- Class Exercise: Use your GitHub code in Deepnote
-
W04 02/07
- Python Assignment DUE BEFORE CLASS
- CH03 Visualizing Data
- Video: Data Viz, Computerphile
- CH04 Linear Algebra
- Exercise
-
W05 02/14
- Viz Assignment DUE BEFORE CLASS
- CH05 Statistics
- CH06 Probability
- Coinflip Class Exercise, Monty Hall Problem
-
W06 02/21
- Pandas Assignment DUE BEFORE CLASS
- CH07 Hypothesis and Inference
- CH08 Gradient Descent
- Class Exercise
-
W07 02/28
- Assignment DUE BEFORE CLASS
- CH09 Getting Data
- CH10 Working With Data
- Class Exercise
-
W08 03/07
- Getting data Assignment DUE BEFORE CLASS
- CH11 Machine Learning
- CH12 k-Nearest Neighbors
- Class Exercise
-
W09 03/14 SPRING BREAK
-
W10 03/21
-
KNN Assignment DUE BEFORE CLASS
-
CH13 Naive Bayes
- Video: Bayes Theorem
-
- Video: Simple Linear Regression Formula
- Class Exercise
-
W11 03/28
- Lineasr Regression Assignment DUE BEFORE CLASS
- CH15 Multiple Regression
- Video: Multiple Regression
- CH16 Logistic Regression
- Video: Data Regression
- Class Exercise
-
W12 04/04
- Regression Assignment DUE BEFORE CLASS
- CH17 Decision Trees
- CH18 Neural Networks
- Class Exercise
-
W13 04/11
- Neural Networks Assignment DUE BEFORE CLASS
- CH19 [Deep Learning]
- CH20 Clustering
- Class Exercise
-
W14 04/18
- Clustering Assignment DUE BEFORE CLASS
- CH21 Natural Language Processing
- CH22 Network Analysis
- Class Exercise
-
W15 04/25
- NLP Assignment DUE BEFORE CLASS
- CH23 Recommender Systems
- CH24 Databases and SQL
- Class Exercise
-
W16 05/02
-
PROJECTS
-
PROJECTS
- Share on Discord
-