Sentiment Analysis

Sentiment analysis (or opinion mining) is a NLP technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.

About the Project

In this project we have performed sentimental analysis on a dataset of covid tweets containing 179108 tweets with 13 different features including user name,user description ,user location etc. The models used include Naive Bayes, Random Forest and Neural Networks. Feature extraction has been done using 2 algorithms namely Bag of words and Tfidf Vectorizer. The train data has been created by using TextBlob.

Data Visualization

Various graphs have been plotted such as the number of unique values in each column, number of tweets from different locations, number of users in various locations etc.

Train Data Generation

Train data has been generated by using TextBlob for classyfying text as positive, neutral or negative.

Text	Classification Using TextBlob
If I smelled the scent of hand sanitizers today on someone in the past, I would think they were so intoxicated thatâ€¦ https://t.co/QZvYbrOgb0	Negative
Hey @Yankees @YankeesPR and @MLB - wouldn't it have made more sense to have the players pay their respects to the Aâ€¦ https://t.co/1QvW0zgyPu	Positive
@diane3443 @wdunlap @realDonaldTrump Trump never once claimed #COVID19 was a hoax. We all claim that this effort toâ€¦ https://t.co/Jkk8vHWHb3	Neutral
#coronavirus #covid19 deaths continue to rise. It's almost as bad as it ever was. Politicians and businesses wantâ€¦ https://t.co/hXMHooXX2C	Negative

Data Preprocessing

The dataset contains text with various kinds of features which are not useful for the analysis. Steps here include converting text to lowercase, removing text if in square brackets,removing links,removing punctuation ,removing words containing numbers, removing emojis, removing stopwords and also lemmatization. This helps in making feature extraction much more easier.

Before Cleaning	After Cleaning
If I smelled the scent of hand sanitizers today on someone in the past, I would think they were so intoxicated thatâ€¦ https://t.co/QZvYbrOgb0	smelled scent hand sanitizers today someone past would think intoxicated that…
Hey @Yankees @YankeesPR and @MLB - wouldn't it have made more sense to have the players pay their respects to the Aâ€¦ https://t.co/1QvW0zgyPu	hey yankee yankeespr mlb wouldnt made sense player pay respect a…
@diane3443 @wdunlap @realDonaldTrump Trump never once claimed #COVID19 was a hoax. We all claim that this effort toâ€¦ https://t.co/Jkk8vHWHb3	wdunlap realdonaldtrump trump never claimed hoax claim effort to…
#coronavirus #covid19 deaths continue to rise. It's almost as bad as it ever was. Politicians and businesses wantâ€¦ https://t.co/hXMHooXX2C	coronavirus death continue rise almost bad ever politician business want…

Modelling

Different models were used from convention ML models to Neural Networks and it was consistently observed that the Tfidf Vectorizer was not a good feature extractor for this dataset. After each model we found the accuracy score,balnced accuracy score and also did hyperparamater tuning in some cases.

Model	Hyperparameter Tuning	Feature Extractor	Accuracy Score	Balanced Accuracy Score
Guassian Naive bayes	None	Bag of Words	0.692759	0.636655
Guassian Naive bayes	None	Tfidf Vectorizer	0.613757	0.565149
Random Forest	None	Bag of Words	0.736251	0.676058
Random Forest	Randomized Search Cv	Tfidf Vectorizer	0.635475	0.676058

Neural Network

We have created a Neural Network consisting of LSTM, embedding, batchNormlization, Desnsely connected layers. The batchNormlization and Desnsely connected layers have been used twice.

Tuning by keras Tuner	Best Accuracy Score	Best Val_Accuracy Score
Before	0.9312	0.9034
After	0.9213	0.9002

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
images		images
NLP_Sentiment_Analysis.ipynb		NLP_Sentiment_Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis

About the Project

Data Visualization

Train Data Generation

Data Preprocessing

Modelling

Neural Network

About

Releases

Packages

Languages

UjjAgarwal/NLP--Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

About the Project

Data Visualization

Train Data Generation

Data Preprocessing

Modelling

Neural Network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages