The PathFinder in the Pandora - An HEX-RL Agent

This code accompanies the paper Inherently Explainable Reinforcement Learning in Natural Language.

Diving into our Project (Abstract)

The focus of this project is to develop a Reinforcement Learning (RL) agent that is inherently capable of providing clear explanations for its decisions. This agent, named Hierarchically Explainable Reinforcement Learning (HEX-RL), can offer immediate local explanations by thinking out loud as it interactswith an environment and can also conduct post-hoc analysis of entire trajectories to provide more extended explanations over time. The HEX-RL agent works in Interactive Fiction game environment where agents can interact with the world using natural language. These environments comprise of puzzles with complex dependencies where agents have to execute a sequence of actions to achieve rewards. Explainability of agent is achieved through knowledge graph representation with a Hierarchical Graph Attention mechanism that focuses on only the specific facts in its internal graph representation that influenced its action choices the most.

Problem Description

Input: The input of this HEX-RL agent is a combination of Zork1 game state descriptions, actions that the agent performed previously and a knowledge graph (KG) representing entities and their relationships.
Model Architecture & Representation: The model’s architecture features knowledge graph state representation Hierarchical Graph Attention mechanism to focus on most influential elements guiding the agent’s action choices.
Output: The output consists of action selections, immediate local explanations and temporally extended explanations summarizing entire trajectories.
HEX-RL Training: Training involves reinforcement learning to maximize task-related rewards, while ensuring explainability through the attention mechanism.
Evaluation & Ablation Study: Testing assesses the agent’s performance across different game scenarios, with human evaluations used to gauge the understandability of explanations.

Zork1 Game Description

Zork1 is text-based adventure game where the main goal is to collect all 19 treasures(rewards) found across game world. There are three types of commands for zork1 game which include directional commands like north, verb-object commands that consists of both verb and noun or noun phrase and third type is verb-object-prep-object commands that consists of verb and noun phrase followed by preposition and second noun phrase. The game then responds according to player’s commands describing what happens next. Zork1 was commercial success and it popularized the genre of Interactive Fiction.

Knowledge Graph(KG) Extraction & KG State Representation

Datasets for Question Answering Task in KG Prediction

Stanford Question Answering Dataset (SQuAD) Dataset for Pretraining ALBERT-QA model
JerichoQA Dataset for Fine-tuning ALBERT-QA model

Model Architecture of HEX-RL

Temporally Extended Explanations Execution Pipeline

Tools and Technologies

Programming Language

Python

Important Libraries & Frameworks

Flask & Gunicorn
Stanford CoreNLP
NetworkX
Jericho Framework
SpaCy
Pattern.en.
Frotz
FuzzyWuzz
NumPy
Redis
PyTorch
Hugging face Transformers (Version2.5.1)
GoExplore Framework

Technologies & Key Domain-specific tasks

Deep Learning
Natural Language Processing(NLP)
- Entity Extraction
- Relation Extraction
- Sub-word Tokenization
- Question Answering
Parallel Computing using T4 GPU
Reinforcement Learning

QuickStart with Code

cd qbert/extraction && gunicorn --workers 4 --bind 0.0.0.0:5000 wsgi:app
redis-server

Open another terminal:

cd qbert && python train.py --training_type base --reward_type game_only  --subKG_type QBert

nohup python train.py --training_type chained --reward_type game_and_IM  --subKG_type QBert --batch_size 2 --seed 0 --preload_weights Q-BERT/qbert/logs/qbert.pt --eval_mode --graph_dropout 0 --mask_dropout 0 --dropout_ratio 0

👁️‍🗨️ Features

--subKG_type: What kind of subgraph you want to use. There are 3 choices, 'Full', 'SHA', 'QBert'.
- 'Full': 4 subgraphs are all full graph_state.
- 'QBert':
  1. __ 'is' __ (Attr of objects)
  2. 'you' 'have' __
  3. __ 'in' __
  4. others (direction)
- 'SHA':
  1. room connectivity (history included)
  2. what's in current room
  3. your inventory
  4. remove you related nodes (history included)
--eval_mode: Whether turning off the training and evaluation the pre-trained model
- bool. True or False
- use --preload_weights at the same time.
--random_action: Whether to use random valid actions instead of QBERT actions.
- bool. True or False

Debug Tricks

graph_dropout to .5 and mask_dropout to .5 in train.py.
The score should reach 5 in 10,000 steps.

Contributions made to the existing Q-BERT

We added a code snippet called _load_bindings module from Jericho Framework to properly load the bindings of the .z5 extension of the Zork1 game within the environment setup module.
We added a folder called Stanford CoreNLP- 2018 Version which includes all the java files to setup the Stanford CoreNLP server. This is used in the source code especially for information extraction pin knowledge graph building and construction. This fixed a lot of path issues and environment initialization issues.
We also included a repository clone of Z-machine games which includes Jericho framework game suite in which we find zork1.z5 file. This is also something which existing source code does not specifies properly.

Team Members & Contributors to the Project

_{Kaspa Vivek}

_{Arikathota Yashwanth}

_{Eppili Mukesh}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/Q-BERT		src/Q-BERT
test/Results & snapshots of Zork1 game		test/Results & snapshots of Zork1 game
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The PathFinder in the Pandora - An HEX-RL Agent

Diving into our Project (Abstract)

Problem Description

Zork1 Game Description

Knowledge Graph(KG) Extraction & KG State Representation

Datasets for Question Answering Task in KG Prediction

Model Architecture of HEX-RL

Temporally Extended Explanations Execution Pipeline

Tools and Technologies

Programming Language

Important Libraries & Frameworks

Technologies & Key Domain-specific tasks

QuickStart with Code

👁️‍🗨️ Features

Debug Tricks

Contributions made to the existing Q-BERT

Team Members & Contributors to the Project

About

Releases

Packages

Languages

Vivekkaspa/The-PathFinder-in-the-Pandora--An-HEX-RL-Agent

Folders and files

Latest commit

History

Repository files navigation

The PathFinder in the Pandora - An HEX-RL Agent

Diving into our Project (Abstract)

Problem Description

Zork1 Game Description

Knowledge Graph(KG) Extraction & KG State Representation

Datasets for Question Answering Task in KG Prediction

Model Architecture of HEX-RL

Temporally Extended Explanations Execution Pipeline

Tools and Technologies

Programming Language

Important Libraries & Frameworks

Technologies & Key Domain-specific tasks

QuickStart with Code

👁️‍🗨️ Features

Debug Tricks

Contributions made to the existing Q-BERT

Team Members & Contributors to the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages