Skip to content

UjjAgarwal/NLP--Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis

Sentiment analysis (or opinion mining) is a NLP technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.

About the Project

In this project we have performed sentimental analysis on a dataset of covid tweets containing 179108 tweets with 13 different features including user name,user description ,user location etc. The models used include Naive Bayes, Random Forest and Neural Networks. Feature extraction has been done using 2 algorithms namely Bag of words and Tfidf Vectorizer. The train data has been created by using TextBlob.

Data Visualization

Various graphs have been plotted such as the number of unique values in each column, number of tweets from different locations, number of users in various locations etc.

Train Data Generation

Train data has been generated by using TextBlob for classyfying text as positive, neutral or negative.

Text Classification Using TextBlob
If I smelled the scent of hand sanitizers today on someone in the past, I would think they were so intoxicated that… https://t.co/QZvYbrOgb0 Negative
Hey @Yankees @YankeesPR and @MLB - wouldn't it have made more sense to have the players pay their respects to the A… https://t.co/1QvW0zgyPu Positive
@diane3443 @wdunlap @realDonaldTrump Trump never once claimed #COVID19 was a hoax. We all claim that this effort to… https://t.co/Jkk8vHWHb3 Neutral
#coronavirus #covid19 deaths continue to rise. It's almost as bad as it ever was. Politicians and businesses want… https://t.co/hXMHooXX2C Negative

Data Preprocessing

The dataset contains text with various kinds of features which are not useful for the analysis. Steps here include converting text to lowercase, removing text if in square brackets,removing links,removing punctuation ,removing words containing numbers, removing emojis, removing stopwords and also lemmatization. This helps in making feature extraction much more easier.

Before Cleaning After Cleaning
If I smelled the scent of hand sanitizers today on someone in the past, I would think they were so intoxicated that… https://t.co/QZvYbrOgb0 smelled scent hand sanitizers today someone past would think intoxicated that…
Hey @Yankees @YankeesPR and @MLB - wouldn't it have made more sense to have the players pay their respects to the A… https://t.co/1QvW0zgyPu hey yankee yankeespr mlb wouldnt made sense player pay respect a…
@diane3443 @wdunlap @realDonaldTrump Trump never once claimed #COVID19 was a hoax. We all claim that this effort to… https://t.co/Jkk8vHWHb3 wdunlap realdonaldtrump trump never claimed hoax claim effort to…
#coronavirus #covid19 deaths continue to rise. It's almost as bad as it ever was. Politicians and businesses want… https://t.co/hXMHooXX2C coronavirus death continue rise almost bad ever politician business want…

Modelling

Different models were used from convention ML models to Neural Networks and it was consistently observed that the Tfidf Vectorizer was not a good feature extractor for this dataset. After each model we found the accuracy score,balnced accuracy score and also did hyperparamater tuning in some cases.

Model Hyperparameter Tuning Feature Extractor Accuracy Score Balanced Accuracy Score
Guassian Naive bayes None Bag of Words 0.692759 0.636655
Guassian Naive bayes None Tfidf Vectorizer 0.613757 0.565149
Random Forest None Bag of Words 0.736251 0.676058
Random Forest Randomized Search Cv Tfidf Vectorizer 0.635475 0.676058

Neural Network

We have created a Neural Network consisting of LSTM, embedding, batchNormlization, Desnsely connected layers. The batchNormlization and Desnsely connected layers have been used twice.

Tuning by keras Tuner Best Accuracy Score Best Val_Accuracy Score
Before 0.9312 0.9034
After 0.9213 0.9002

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published