Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed emoji to text conversion for emoji not surrounded by whitespace #57

Merged
merged 2 commits into from
Aug 6, 2019

Conversation

ckw017
Copy link
Contributor

@ckw017 ckw017 commented Sep 6, 2018

Fixes this issue: Not predicting sentiment of emoticons correctly #56

Since the current method splits up the text into tokens by whitespace, it won't recognize multiple emoji in a row without whitespace, ie "😀😀😀" isn't given any meaning since the exact string "😀😀😀" isn't in the emoji lexicon, when it should probably have the same meaning as "😀 😀 😀". By checking for emoji on a character by character basis should fix this. Example output after the fix:

>>> SIA.polarity_scores("💋")
//(Interpreted as "kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.4215}
>>> SIA.polarity_scores("💋💋💋")
//(Interpreted as "kiss mark kiss mark kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.8126}

The compound score goes up as expected for three emoji in a row

Copy link
Owner

@cjhutto cjhutto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@cjhutto cjhutto merged commit 3b92578 into cjhutto:master Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants