Skip to content

Latest commit

 

History

History
217 lines (148 loc) · 9.68 KB

README.md

File metadata and controls

217 lines (148 loc) · 9.68 KB

Scrapism @ SFPC, Fall 2022

Instructor: Sam Lavigne | [email protected]
Assistant Teacher: Ilona Brand
Location: Online
Time: Tuesdays 10am-1pm ET
Office Hours: By appointment
Final Class Showcase: https://projects.sfpc.study/scrapism-fall-2022-showcase/

Web scraping is the process of automatically downloading and manipulating web content. It's a common practice in silicon valley, where companies large and small transform open html pages into commodified datasets.

As an alternative, "Scrapism" is the practice of web scraping for artistic, emotional, and critical ends. By combining aspects of data journalism, conceptual art, and hoarding, it offers a methodology to make sense of a world in which everything we do is mediated by internet companies. These companies surveil us, exploit and financialize our experiences, and attempt to vacuum up every trace we leave behind. But in turn they also leave their own traces online, traces which when collected, filtered, and sorted can reveal or even intervene in power relations.

In this class participants will learn how to scrape massive quantities of material from the web with Python, and then use this source material in projects that probe the politics and poetics of the internet. We will cover multiple web scraping techniques, as well as different techniques for manipulating and presenting textual content.

Schedule

1. October 4th

Introductions. Using the terminal. Reading lines.

Readings for next week

Homework

  • Create a work of computationally generated poetry using only command-line tools. These might include grep, sort, tr, cat, sed, fold, curl, say, and others. You can repurpose an existing text, or write one on your own.

2. October 11th

Intro to python. Manipulating text. Automating writing.

Readings for next week

Homework

  • Write a python script that combines texts from two or more sources to create a generative poem.

3. October 18th

HTML and CSS basics. Web scraping basics. Making big lists. Basic html publications.

Readings for next week

Project 1 (due on November 1st)

Theme: The Language of Power

Brief: Compile a list, or an archive of text. Transform that archive into a zine or similar publication. Your publication can be printed or online. Experiment with how you sort or organize your archive, and pay special attention to how the presentation of your archive affects and manipulates the source material.


4. October 25th

Web scraping part 2: JSON and APIs. Fishing for data. Intro to NLP.

Optional readings for next week


5. November 1st

Project 1 crit. Using real browsers. Scraping images. Basic image publications.

Readings for next week

Project 2 (due November 15th)

Theme: The Commodification of Everything

Brief: Create an archive of images. Present the archive as a publication (in the broad sense) that enhances or underlines its content. Think about how the images are arranged and manipulated. For example, should they all be seen at once? In multiples of 10? One at a time? What determines the order of the images? Should the images be annotated? Vandalized? Should the images be modified or processed?


6. November 8th

Real browsers. Processing and analyzing images. Turning images into text.

Readings for next week


7. November 15th

Project 2 crit. Scraping video.

Readings for next week

Project 3 (Due December 6th)

Theme: Seeing like a state

Collect a dataset, and transform it into a publication.


8. November 22nd

Automating video. Or, working with data (class votes).

Readings for next week


9. November 29th

Bots and running scripts over time.


10. December 6th

Project 3 crit. Wrap-up discussion.


Some inspiration

Fun and useful Python Libraries


Whatever you have to say, leave
The roots on, let them
Dangle

And the dirt

Just to make clear
Where they come from.

-Charles Olson