This course will provide graduate students the technical skills necessary to conduct research in computational social science and digital humanities, introducing them to the basic computer literacy, programming skills, and application knowledge that students need to be successful in further methods work.
The course is currently divided into five main sections. In the first section, students learn how their computers work and communicate with other computers using git and bash. In the second, we turn our attention to the basics of R and Python. In the third, students learn tools for acquiring data through APIs and webscraping. In the fourth, students will practice using R to clean and analyze data efficiently. In the fifth, students will be exposed to additional means of analyzing and visualizing data, including tools like text analysis and machine learning.
Please note that materials are still in development, and will be changing.
- Understand basic programming terminologies, structures, and conventions
- Navigate and operate effectively in a UNIX environment
- Understand basic Git and GitHub workflows
- Write, execute, and debug R code for novel data collection, cleaning, analysis, and visualization
- Write and execute basic code in Python
- Be familiar with the concepts and tools of a variety of computational social science / digital humanities applications
- Be familiar with the basic guidelines around reproducible research, good scientific computing practices, and ethics/privacy/legal quandaries
- Learn independently and train themselves in a variety of computational applications and tasks through online documentation (we will have access to Datacamp's courses for the duration of the semester)
Julia Christensen
Anustubh Agnihotri
Monday 2-4 pm Wednesday 2-4 pm
122 Barrows
By appointment.
We will use bCourses for communication (announcements and questions) and turning in assignments. You should ask questions about class material and assignments through the bCourses website so that everyone can benefit from the discussion. We encourage you to respond to each other’s questions as well.
All course materials will be posted on Github at https://github.com/juliachristensen/PS239T_Fall2019, including class notes, code demonstrations, sample data, and assignments. Students are required to use GitHub for their final projects, which will be publicly available, unless they have special considerations (e.g. proprietary data).
This class is committed to creating an environment in which everyone can participate, regardless of background, discipline, or disability. If you have a particular concern, please come to me as soon as possible so that we can make special arrangements.
This is a graded class based on the following:
- Completion of assigned homework (50%)
- Participation (25%)
- Final project (25%)
Weekly assignments will be due as follows:
Date | Assignment |
---|---|
Thursday, August 29 | Fill out survey |
Tuesday, September 3 | Bash/Unix/Git Online Tutorial |
Wednesday, September 4 | Submit proof of installation |
Sunday, September 8 | R datacamp tutorial(s) |
Sunday, September 15 | Tidyverse R datacamp tutorial(s) |
Sunday, September 17 | Python datacamp tutorial(s) |
Sunday, September 22 | Python datacamp tutorial(s) |
Sunday, September 29 | Database exploration |
Sunday, October 6 | Final project proposal |
Sunday, October 13 | API/Webscraping project |
Sunday, October 20 | Data cleaning project |
Sunday, October 27 | Data visualization project |
Sunday, November 3 | Final project update |
November 25-December 4 | Final project presentations |
Wednesday, December 11 | Final projects due |
Assignment details can be found on bcourses. Unless otherwise specified, assignments should be turned in as pdf documents via the bCourses site.
Time will be provided in class for additional exercises. Any exercises that are not completed in class should be completed before the beginning of the following class.
If contacted in advance, instructors are generally willing to provide extensions.
The class participation portion of the grade can be satisfied in one or more of the following ways:
- attending the lecture and section (note that section is non-optional)
- asking and answering questions in class
- contributing to class discussion through the bCourse site, and/or
- collaborating with the campus computing community, either by attending a D-Lab or BIDS workshop, submitting a pull request to a campus github repository (including the class repository), answering a question on StackExchange, or other involvement in the social computing / digital humanities community.
Because we will be using laptops every class, the temptation to attend to other things during slow moments will be high. While you may choose to do so, I do request that you think of your laptop screen as in the public domain for the duration of class time. Please do not load anything that will distract your classmates or is otherwise inappropriate to a classroom setting.
The final project consists of using the tools we learned in class on your own data of interest. First- and second-year students in the political science department are encouraged to use this as an opportunity to gather data to be used for other courses or the second-year thesis. Students are required to write a short proposal by October 6 (no more than 2 paragraphs) in order to get approval and feedback from the instructors.
During the last few classes, we will have lightning talk sessions where students present their projects in a maximum 5 minute talk, with 5 minutes for class Q&A. Since there is no expectation of a formal paper, you should select a project that is completable by the end of the term. In other words, submitting a research design for your future dissertation that will use skills from the class but collects no data is not acceptable, but completing a viably small portion of a study or thesis is.
Students will be expected to attend every class. Absences may be approved if the instructors receive notice in advance and a reasonable explanation is provided.
Both Monday and Wednesday classes will be required. Instead of separate lectures and sections, all classes will follow a “workshop” style, combining lecture and lab formats.
Date | Topic | Instructor |
---|---|---|
Wednesday, August 28 | Introduction | Julia |
Monday, September 2 | No class | n/a |
Wednesday, September 4 | Bash/Unix/Git | Julia |
Monday, September 9 | Intro to R | Julia |
Wednesday, September 11 | Intro to R | Julia |
Monday, September 16 | Intro to R | Julia |
Wednesday, September 18 | Intro to Python | Anustubh |
Monday, September 23 | Intro to Python | Anustubh |
Wednesday, September 25 | Intro to Python | Anustubh |
Monday, September 30 | APIs | Julia |
Wednesday, October 2 | HTML + Intro to webscraping | Julia |
Monday, October 7 | Webscraping | Anustubh |
Wednesday, October 9 | Webscraping | Anustubh |
Monday, October 14 | Data Cleaning | Julia |
Wednesday, October 16 | Data Cleaning | Anustubh |
Monday, October 21 | Data Visualization | Julia |
Wednesday, October 23 | Data Visualization | Julia |
Monday, October 28 | Organization & Collaboration | Anustubh |
Wednesday, October 30 | Micro-lectures | TBD |
Monday, November 4 | Micro-lectures | TBD |
Wednesday, November 6 | Text analysis | TBD |
Monday, November 11 | No class | n/a |
Wednesday, November 13 | Text analysis | TBD |
Monday, November 18 | Machine Learning | TBD |
Wednesday, November 20 | Machine Learning | Chris Kennedy |
Monday, November 25 | Presentations | n/a |
Wednesday, November 27 | No class | n/a |
Monday, December 2 | Presentations | n/a |
Wednesday, December 4 | Presentations | n/a |
The software needed for the course is as follows:
- Access to the UNIX command line (e.g., a Mac laptop, a Bash wrapper on Windows)
- Git
- R and RStudio (latest versions)
- Anaconda and Python 3 (latest versions)
- Pandoc and LaTeX
This requires a computer that can handle all this software. Almost any Mac will do the job. Most Windows machines are fine too if they have enough space and memory.
You must have all the software downloaded and installed PRIOR to the first day of class. If there are issues with installation on your machine, please contact the instructors for assistance.
See B_Install.md for more information.
There are no official textbooks for this class. Readings will be light, and posted as part of the weekly homework assignments on bCourses. For the semester, we will have access to all of Datacamp's premium course materials (thank you Datacamp!).