This code accompanies the paper Inherently Explainable Reinforcement Learning in Natural Language.
The focus of this project is to develop a Reinforcement Learning (RL) agent that is inherently capable of providing clear explanations for its decisions. This agent, named Hierarchically Explainable Reinforcement Learning (HEX-RL), can offer immediate local explanations by thinking out loud as it interactswith an environment and can also conduct post-hoc analysis of entire trajectories to provide more extended explanations over time. The HEX-RL agent works in Interactive Fiction game environment where agents can interact with the world using natural language. These environments comprise of puzzles with complex dependencies where agents have to execute a sequence of actions to achieve rewards. Explainability of agent is achieved through knowledge graph representation with a Hierarchical Graph Attention mechanism that focuses on only the specific facts in its internal graph representation that influenced its action choices the most.
- Input: The input of this HEX-RL agent is a combination of Zork1 game state descriptions, actions that the agent performed previously and a knowledge graph (KG) representing entities and their relationships.
- Model Architecture & Representation: The model’s architecture features knowledge graph state representation Hierarchical Graph Attention mechanism to focus on most influential elements guiding the agent’s action choices.
- Output: The output consists of action selections, immediate local explanations and temporally extended explanations summarizing entire trajectories.
- HEX-RL Training: Training involves reinforcement learning to maximize task-related rewards, while ensuring explainability through the attention mechanism.
- Evaluation & Ablation Study: Testing assesses the agent’s performance across different game scenarios, with human evaluations used to gauge the understandability of explanations.
Zork1 is text-based adventure game where the main goal is to collect all 19 treasures(rewards) found across game world. There are three types of commands for zork1 game which include directional commands like north, verb-object commands that consists of both verb and noun or noun phrase and third type is verb-object-prep-object commands that consists of verb and noun phrase followed by preposition and second noun phrase. The game then responds according to player’s commands describing what happens next. Zork1 was commercial success and it popularized the genre of Interactive Fiction.
- Stanford Question Answering Dataset (SQuAD) Dataset for Pretraining ALBERT-QA model
- JerichoQA Dataset for Fine-tuning ALBERT-QA model
- Python
- Flask & Gunicorn
- Stanford CoreNLP
- NetworkX
- Jericho Framework
- SpaCy
- Pattern.en.
- Frotz
- FuzzyWuzz
- NumPy
- Redis
- PyTorch
- Hugging face Transformers (Version2.5.1)
- GoExplore Framework
- Deep Learning
- Natural Language Processing(NLP)
- Entity Extraction
- Relation Extraction
- Sub-word Tokenization
- Question Answering
- Parallel Computing using T4 GPU
- Reinforcement Learning
cd qbert/extraction && gunicorn --workers 4 --bind 0.0.0.0:5000 wsgi:app
redis-server
- Open another terminal:
cd qbert && python train.py --training_type base --reward_type game_only --subKG_type QBert
nohup python train.py --training_type chained --reward_type game_and_IM --subKG_type QBert --batch_size 2 --seed 0 --preload_weights Q-BERT/qbert/logs/qbert.pt --eval_mode --graph_dropout 0 --mask_dropout 0 --dropout_ratio 0
-
--subKG_type
: What kind of subgraph you want to use. There are 3 choices, 'Full', 'SHA', 'QBert'.- 'Full': 4 subgraphs are all full graph_state.
- 'QBert':
- __ 'is' __ (Attr of objects)
- 'you' 'have' __
- __ 'in' __
- others (direction)
- 'SHA':
- room connectivity (history included)
- what's in current room
- your inventory
- remove you related nodes (history included)
-
--eval_mode
: Whether turning off the training and evaluation the pre-trained model- bool. True or False
- use
--preload_weights
at the same time.
-
--random_action
: Whether to use random valid actions instead of QBERT actions.- bool. True or False
- graph_dropout to .5 and mask_dropout to .5 in
train.py
. - The score should reach 5 in 10,000 steps.
- We added a code snippet called _load_bindings module from Jericho Framework to properly load the bindings of the .z5 extension of the Zork1 game within the environment setup module.
- We added a folder called Stanford CoreNLP- 2018 Version which includes all the java files to setup the Stanford CoreNLP server. This is used in the source code especially for information extraction pin knowledge graph building and construction. This fixed a lot of path issues and environment initialization issues.
- We also included a repository clone of Z-machine games which includes Jericho framework game suite in which we find zork1.z5 file. This is also something which existing source code does not specifies properly.
Kaspa Vivek |
Arikathota Yashwanth |
Eppili Mukesh |