-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results difficult to explain #16
Comments
Hey, Are you using the token embeddings or the sequence embeddings in this case? In my experience, the BERT sequence embeddings in particular (but sometimes also the token embeddings) don't do as good a job in raw distance calculations for semantic similarity as some other models. This is basically just a result of the tasks BERT is trained for and the transformer architecture it uses. Generally you might have better luck with cosine distance, as you won't have to worry about effects from embedding magnitudes. That said, if you're looking to do this sort of thing (especially with individual words), you might want to look into a different model like Universal Sentence Encoder, ELMo, GloVe, etc. that's designed to better support semantic similarity w/simple distance metrics. |
Hi Rob! Thanks for your feedback. I am using the sequence embedding that returns a float[]. Token embeddings return a float[][] and I don't know what to do with it to calculate a cosine similarity. An idea? Following your advice, indeed the cosine similarity seems more reliable than a Manhattan or Euclidian distance, as per following Series 1 examples. Note however that, in Series 1, poor has a higher score (similarity) than wealthy with rich. I will try with other models (which may be more difficult to use in Java but this is another story).
|
Dear Rob,
I do not know whether this is a bug or not but I have strange results, as per follows.
I compare the embeddings of two words, and the average (on 768 values) absolute difference is lower for different word than for synonyms.
I would have expected a lower difference for rich and a greater for poor. Where am I actually wrong?
Thank you.
Example 1:
Example 2:
Example 3:
Example 4:
The text was updated successfully, but these errors were encountered: