Individual interdisciplinary project 2017/2018, Supervisors: Tobias Scheffer & Paul Prasse, Submission Date: 02.03.2018 @ University of Potsdam, Germany
This project is working on Anaconda with :
Python 3.6.3
Run in console:
pip install -r requirements.txt
Set your Twitter API keys in config.py
.
#Twitter API credentials
consumer_key = '#####'
consumer_secret = '#####'
access_token = '#####'
access_secret = '#####'
Create a folder ./data/datasets/ in root. Run in console:
python crawltwitter.py -a [twitter_user] -c1 [max_tweets] -c2 [max_accounts]
p(ex) :
python crawltwitter.py -a DataScienceCtrl -c1 10000 -c2 100
Data for each accounts retrieved will be stored in data/datasets.
The account names retrieved by this crawling will be stored in data/TwitterCrawlXXXX-XX-XXXXX.json
Create a folder ./data/gathered/ in root. Run in console:
python gather.py -max 10000
A json file will be created in data/gathered with datas ready for training.
Create folders ./data/cache/ and ./data/publish/ in root. Run in console:
python training.py -d data\gathered\gathering_xxx_xxx.json -save -i -i -i -i
The 4 options -i corresponds of highest level of data set optimisation, you can remove them.
Feature importance will be displayed. Baselines tests too.
Thanks to -save option, model will be saved in ./data/publish/.
Run in console:
python server.py -f data\publish\xxxxxx.model.json
Then, go to http://127.0.0.1:5000 and let play with predictions.