Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 1.27 KB

README.md

File metadata and controls

13 lines (8 loc) · 1.27 KB

tweet-emotion-classifier

This classifier with full unicode support takes two sets of training data in the form of tab delimited files containing tweets mined using the Twitter API.

The SVM_perf implementation of the Support Vector Machine (SVM) is then used to create a binary classifier.

For example, the first training set can be composed of tweets mined using happy emoticons as a query. The second would then be composed of tweets mined using sad emoticons as a query. Running the authormator.sh shell script would clean the tweets, stripping them of superfluous data and characters.

The training data is then used to create a dictionary of features; an index of unique words in the dataset. The tweets are randomized and then divided into training, development, and testing sets that are then output as series of ‘feature:value’ pairs based on the features in each tweet found in the feature dictionary. Each of these lines was preceded by a class identifier, -1 for ‘sad’ and 1 for ’happy’ tweets.
This vectorised data was then used to train an SVM classifier using svm_pref_learn with a C value of 20 for the hashtag based classifier and 220 for the emoticon based classifier.

The resulting classifier, if it performs well, can then be used to classify further tweets.