Scala app that retrieves tweets using the Twitter API and performs sentiment analysis with the Stanford CoreNLP library.
Tweets are retrieved based on search keywords we specify, and the tweet text is fed to the NLP library for sentiment analysis.
Finally, tweets are produced to a Kafka topic.
The app needs a Bearer Token to authenticate against the API (OAuth 2.0 App-Only auth).
The token has to be be generated on the Twitter Developer portal. More info here.
Once you have generated one, place it in the config file (tweeter.api_auth_bearer_token
).
Keywords for tweet search must also be specified in the config file (tweeter.search_keywords
).
Caveat: only tweets for the last week are retrieved (we use the Recent search
option as opposed to the Full-archive search
).
The Stanford CoreNLP
library works by splitting a text into sentences and and assigning a sentiment value to each one:
- Values 0 or 1 =>
negative
sentiment. - Value 2 =>
neutral
sentiment. - Values 3 or 4 =>
positive
sentiment.
Given this, if a tweet contains multiple sentences, we pick the most frequently assigned sentiment.
Tweets are produced to a Kafka topic using Avro serialization.
A local Kafka instance with schema registry is available (see docker-compose.yml).
Check the Makefile for how to compile, test and run the application.
A Github Actions workflow for CI/CD is defined here and can be seen here.