Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 575 Bytes

APS_README.md

File metadata and controls

16 lines (11 loc) · 575 Bytes
  • Cloned from git://github.com/whymarrh/jeopardy-parser.git
  • Original has been modified to not write to sql db and use '||' as a separator
  • This allows easier loading into a pandas dataframe for subsequent cleaning
pip install -r requirements.txt
python download.py <archive_dir> <starting_game_to_download>
python parser.py -d <archive_dir> > jarchive_xxx.csv
cat jarchive_xxx.csv > jarchive.csv
  • This first downloads the games into directory j-archive (currently has games upto id 6095, which is 7/27/18)
  • parser.py then extracts questions to jarchive.csv