Skip to content

Latest commit

 

History

History
31 lines (14 loc) · 1.36 KB

File metadata and controls

31 lines (14 loc) · 1.36 KB

Description

This is a Jupyter Notebook file to calculate the 3 most popular Word Embedding-based metrics with Python to evaluate a generative conversational chatbot's answering performance for dialogue texts.

The 3 metrics implemented:

  • Greedy Matching score, the cosine similarity matching between the 300d vectors of the reference answer and the chatbot's answer

  • Embedding average score, average cosine similarity between vectors

  • Vector Extrema score, min and max score of cosine similarity

Example Usage:

(see "EMBEDDING_METRICS_TEST_EXAMPLE")

Screenshot: pic

References:

  • A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Word Similarity Metrics Using Word to Word Similarity Metrics. Vasile Rus, Mihai Lintean. 2012. Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL 2012.
  • Bootstrapping Dialog Systems with Word Embeddings. G. Forgues, J. Pineau, J. Larcheveque, R. Tremblay. 2014. Workshop on Modern Machine Learning and Natural Language Processing, NIPS 2014.
  • Sai, A. B., Mohankumar, A. K., and Khapra, M. M. (2022). A survey ofevaluation metrics used for nlg systems. ACM Computing Surveys (CSUR),55(2):1–39.