Skip to content

sherzod-hakimov/clm-pol-gen

 
 

Repository files navigation

clembench: A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents

The cLLM (chat-optimized Large Language Model, "clem") framework tests such models' ability to engage in games – rule-constituted activities played using language. The framework is a systematic way of probing for the situated language understanding of language using agents.

This repository contains the code for setting up the framework and implements a number of games that are further discussed in

Chalamalasetti, K., Götze, J., Hakimov, S., Madureira, B., Sadler, P., & Schlangen, D. (2023). clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents (arXiv:2305.13455). arXiv. https://doi.org/10.48550/arXiv.2305.13455

Evaluation Results

On the main project website , under leaderboard.

Game details

Using the benchmark

We welcome you to contribute to or extend the benchmark with your own games and models. Please simply open a pull request. You can find more information on how to use the benchmark in the links below.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.9%
  • Shell 2.5%
  • Other 0.6%