Skip to content

Commit

Permalink
Added Tweet Count to README and Featurization
Browse files Browse the repository at this point in the history
Updated the README to include the number of tweets used to build the
vectorizer and FastText model and added code to build those models given
a set of tweets.
  • Loading branch information
Cody Buntain committed Jan 10, 2019
1 parent 8671897 commit f4c0e98
Show file tree
Hide file tree
Showing 5 changed files with 2,287 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ While FastText provides several pre-trained word vector datasets trained on Wiki
### Featurization Models

We provide the TF-IDF vectorizer built from a 1-percent sample of English tweets posted to Twitter and captured in Twitter's public sample stream between 2013 and 2016.
This dataset contains 11,715,393 tweets.
You can download this vectorizer here: [2013to2016_tfidf_vectorizer_20190109.pkl](http://obj.umiacs.umd.edu/trecis_2018/2013to2016_tfidf_vectorizer_20190109.pkl)

We also provide our FastText-trained model on this same set of English tweets, which you can find here: [archived_text_sample_2013to2016_gensim_200.model.tgz](http://obj.umiacs.umd.edu/trecis_2018/archived_text_sample_2013to2016_gensim_200.model.tgz)
Expand Down
Loading

0 comments on commit f4c0e98

Please sign in to comment.