tweet-emotion-classifier

This classifier with full unicode support takes two sets of training data in the form of tab delimited files containing tweets mined using the Twitter API.

The SVM_perf implementation of the Support Vector Machine (SVM) is then used to create a binary classifier.

For example, the first training set can be composed of tweets mined using happy emoticons as a query. The second would then be composed of tweets mined using sad emoticons as a query. Running the authormator.sh shell script would clean the tweets, stripping them of superfluous data and characters.

The training data is then used to create a dictionary of features; an index of unique words in the dataset. The tweets are randomized and then divided into training, development, and testing sets that are then output as series of ‘feature:value’ pairs based on the features in each tweet found in the feature dictionary. Each of these lines was preceded by a class identifier, -1 for ‘sad’ and 1 for ’happy’ tweets.
This vectorised data was then used to train an SVM classifier using svm_pref_learn with a C value of 20 for the hashtag based classifier and 220 for the emoticon based classifier.

The resulting classifier, if it performs well, can then be used to classify further tweets.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
tweet-classifier		tweet-classifier
tweet-cleaning		tweet-cleaning
README.md		README.md
automator.sh		automator.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tweet-emotion-classifier

About

Releases

Packages

Languages

asmiley/tweet-emotion-classifier

Folders and files

Latest commit

History

Repository files navigation

tweet-emotion-classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages