Dataset of old newspaper article headlines

Run get_newspaper.sh in data/newspaper/
This will download and extract a .tsv of a lager kaggle dataset (https://www.kaggle.com/alvations/old-newspapers)
Run the filter_raw_data.py script.
This will extract the relevant data (newspapers in english language) and write then to the newspaper_dataset.pickle file, which can then be read and processed by helper methods in data_utils.py
Delete the .tsv file (optional)

Provide feedback