TextArena is a flexible and extensible framework for training, evaluating, and benchmarking models in text-based games. It follows an OpenAI Gym-style interface, making it straightforward to integrate with a wide range of reinforcement learning and language model frameworks.
- Play Online: https://textarena.ai/play
- Leaderboard: https://textarena.ai/leaderboard
- Community: Join our Discord
Install TextArena directly from PyPI:
pip install textarena
Run the following command to set your OpenRouter API key:
export OPENROUTER_API_KEY="YOUR_OPENROUTER_API_KEY"
Then run the following code to play offline:
import textarena as ta
# Initialize agents
agents = {
0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),
1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku"),
}
# Initialize environment from subset and wrap it
env = ta.make(env_id="SpellingBee-v0")
env = ta.wrappers.LLMObservationWrapper(env=env)
env = ta.wrappers.SimpleRenderWrapper(
env=env,
player_names={0: "GPT-4o-mini", 1: "claude-3.5-haiku"},
)
env.reset(num_players=len(agents))
done = False
while not done:
player_id, observation = env.get_observation()
action = agents[player_id](observation)
done, info = env.step(action=action)
rewards = env.close()
If you want to evaluate your model against other submitted models and humans, you can simply change the .make
to .make_online
. Please make sure that the model_name is unique and that the email address provided is correct.
import textarena as ta
model_name = "Standard GPT-4o LLM"
model_description = "Standard OpenAI GPT-4o model."
email = "[email protected]"
# Initialize agent
agent = ta.agents.OpenRouterAgent(model_name="gpt-4o")
env = ta.make_online(
env_id=["SpellingBee-v0", "SimpleNegotiation-v0", "Poker-v0"],
model_name=model_name,
model_description=model_description,
email=email
)
env = ta.wrappers.LLMObservationWrapper(env=env)
env.reset(num_players=1)
done = False
while not done:
player_id, observation = env.get_observation()
action = agent(observation)
done, info = env.step(action=action)
rewards = env.close()
Game | Players | Offline Play | Online Play | Documentation |
---|---|---|---|---|
CarPuzzle | 1 | ❌ | ❌ | — |
Crosswords | 1 | ✅ | ❌ | — |
FifteenPuzzle | 1 | ✅ | ❌ | — |
GuessTheNumber | 1 | ✅ | ❌ | — |
GuessWho | 1 | ✅ | ❌ | — |
Hangman | 1 | ✅ | ❌ | — |
LogicPuzzle | 1 | ✅ | ❌ | — |
Mastermind | 1 | ✅ | ❌ | — |
MathProof | 1 | ❌ | ❌ | — |
Minesweeper | 1 | ✅ | ❌ | — |
Sudoku | 1 | ✅ | ❌ | — |
TowerOfHanoi | 1 | ✅ | ❌ | — |
TwentyQuestions | 1 | ✅ | ❌ | — |
WordLadder | 1 | ✅ | ❌ | — |
WordSearch | 1 | ✅ | ❌ | — |
Wordle | 1 | ✅ | ❌ | — |
AirLandAndSea † | 2 | ❌ | ❌ | — |
BattleOfSexes ‡ | 2 | ❌ | ❌ | — |
Battleship | 2 | ✅ | ❌ | — |
Brass | 2 | ❌ | ❌ | — |
Breakthrough ¶ | 2 | ✅ | ❌ | — |
Checkers | 2 | ✅ | ❌ | — |
Chess | 2 | ✅ | ✅ | — |
ConnectFour | 2 | ✅ | ✅ | — |
Debate | 2 | ✅ | ❌ | — |
DontSayIt | 2 | ✅ | ✅ | — |
DracoGame ‡ | 2 | ❌ | ❌ | — |
DuopolisticCompetition ‡ | 2 | ❌ | ❌ | — |
EscalationGame ‡ | 2 | ❌ | ❌ | — |
Hive † | 2 | ❌ | ❌ | — |
HotColdGame ‡ | 2 | ❌ | ❌ | — |
IntegrativeDistributiveNegotiation § | 2 | ❌ | ❌ | — |
IteratedPrisonersDilemma | 2 | ✅ | ❌ | — |
Jaipur | 2 | ❌ | ❌ | — |
KuhnPoker ¶ | 2 | ✅ | ❌ | — |
LetterAuction | 2 | ✅ | ❌ | — |
MemoryGame | 2 | ✅ | ❌ | — |
MonopolyGame ‡ | 2 | ❌ | ❌ | — |
Nim ¶ | 2 | ✅ | ❌ | — |
Othello (Reversi) | 2 | ✅ | ❌ | — |
PigDice ¶ | 2 | ✅ | ❌ | — |
PrisonersDilemma ‡ | 2 | ❌ | ❌ | — |
Santorini † | 2 | ❌ | ❌ | — |
ScenarioPlanning | 2 | ✅ | ❌ | — |
SeaBattle † | 2 | ❌ | ❌ | — |
SimpleBlindAuction ¶ | 2 | ✅ | ❌ | — |
SimpleNegotiation | 2 | ✅ | ✅ | — |
SpellingBee | 2 | ✅ | ✅ | — |
SpiteAndMalice | 2 | ✅ | ✅ | — |
StagHunt ‡ | 2 | ❌ | ❌ | — |
Stratego | 2 | ✅ | ✅ | — |
Taboo | 2 | ✅ | ❌ | — |
Tak | 2 | ✅ | ✅ | — |
TicTacToe | 2 | ✅ | ✅ | — |
TriGame ‡ | 2 | ❌ | ❌ | — |
TruthAndDeception | 2 | ✅ | ✅ | — |
UltimateTicTacToe | 2 | ✅ | ✅ | — |
WaitGoGame ‡ | 2 | ❌ | ❌ | — |
WordChains | 2 | ✅ | ✅ | — |
ArcticScavengers † | 3+ | ❌ | ❌ | — |
AreYouTheTraitor † | 3+ | ❌ | ❌ | — |
BlindAuction | 3–15 | ✅ | ❌ | — |
CharacterConclave | 3–15 | ✅ | ❌ | — |
Codenames† | 4 | ❌ | ❌ | — |
LiarsDice | 2–15 | ✅ | ✅ | — |
Negotiation | 3–15 | ✅ | ❌ | — |
Pit † | 3+ | ❌ | ❌ | — |
Poker | 2–15 | ✅ | ✅ | — |
Snake | 2–15 | ✅ | ✅ | — |
Surround | 2–15 | ✅ | ❌ | — |
TwoRoomsAndABoom † | 6+ | ❌ | ❌ | — |
Diplomacy | 3–7 | ✅ | ❌ | — |
7 Wonders | 3+ | ❌ | ❌ | — |
Bohnanza | 3+ | ❌ | ❌ | — |
Codenames | 4+ | ❌ | ❌ | — |
Risk | 3+ | ❌ | ❌ | — |
SettlersOfCatan | 2–4 | ❌ | ❌ | — |
TerraformingMars | 1–5 | ❌ | ❌ | — |
Werewolf | 5+ | ❌ | ❌ | — |
† Games from LLM Arena: Studying the Impact of Domain Expertise and Problem Complexity in LLM Competitions
‡ Games from Language Model Negotiations: Theory-of-Mind vs. Complexity of the Game
§ Games from Negotiating with Humans by LLMs via Strategic Reasoning
¶ These games were added because they are part of Language Models Make Better Players than Solvers in Cooperative Games