Skip to content

Commit

Permalink
Update README based on Richard's suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
shayakbanerjee committed Dec 21, 2017
1 parent 00017c4 commit b788539
Showing 1 changed file with 52 additions and 35 deletions.
87 changes: 52 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,33 +22,39 @@ To view the state of the board at any given time (you'll get a console output):
```python
b.printBoard()
```
The co-ordinate system is shown below:

The co-ordinate system is shown below, and is the same for the master board, as well as any tile within it:
![ultimate tic tac toe image](https://github.com/shayakbanerjee/ultimate-ttt-rl/raw/master/figures/coordinate_system.png)

## Players
There are two implemented bots for playing the game
1. `RandomUTTTPlayer` who makes moves at random
1. `RLUTTTPlayer` who makes moves based on a user-supplied learning algorithm

To play the game with these different bots.
To play the game with one or a combination of these bots, use the `SingleGame` class. E.g. with two random players
```python
from game import SingleGame
from ultimateplayer import RandomUTTTPlayer
from ultimateboard import UTTTBoard, UTTTBoardDecision

player1, player2 = RandomUTTTPlayer(), RandomUTTTPlayer()
game = SingleGame(player1, player2, UTTTBoard, UTTTBoardDecision)
result = game.playAGame()
```
When using the RL player, it will need to be initialized with a learning algorithm of your choice. I've already provided two sample learning algorithms: `TableLearning` and `NNUltimateLearning`
```python
def playAGame(self, board):
player1 = RandomUTTTPlayer()
player2 = RLUTTTPlayer()
while board.getBoardDecision() == self.BoardDecisionClass.ACTIVE:
player1.setBoard(board, GridStates.PLAYER_X)
player2.setBoard(board, GridStates.PLAYER_O)
pState1 = self.player1.makeNextMove()
player1.learnFromMove(pState1)
player2.learnFromMove(pState1)
pState2 = self.player2.makeNextMove()
player1.learnFromMove(pState2)
player2.learnFromMove(pState2)
return board.getBoardDecision()
from game import SingleGame
from learning import TableLearning
from ultimateplayer import RandomUTTTPlayer, RLUTTTPlayer
from ultimateboard import UTTTBoard, UTTTBoardDecision

player1, player2 = RLUTTTPlayer(TableLearning(UTTTBoardDecision)), RandomUTTTPlayer()
game = SingleGame(player1, player2, UTTTBoard, UTTTBoardDecision)
result = game.playAGame()
```
The `learnFromMove` calls are necessary for the bots to learn from every move. The example shows a random player against a reinforcement learning player, but you can choose to play RL vs RL or Random vs Random. Switching the order of player1 and player2 will assign `O` to the RL player and `X` to the Random player.

## Learning Algorithm
The learning algorithm is the key piece to the puzzle for making the RL bot improve its chances of winning over time. There is a generic template provided for the learning algorithm:
The reinforcement learning (RL) player uses a learning algorithm to improve its chances of winning as it plays a number of games and learns about different positions. The learning algorithm is the key piece to the puzzle for making the RL bot improve its chances of winning over time. There is a generic template provided for the learning algorithm:
```python
class GenericLearning(object):
def getBoardStateValue(self, player, board, boardState):
Expand All @@ -68,34 +74,45 @@ class GenericLearning(object):
pass
```
Any learning model must inherit from this class and implement the above methods. For examples see `TableLearning` for a lookup table based solution, and `NNUltimateLearning` for a neural network based solution.
Every *board state* is an 81-character string which represents a raster scan of the entire 9x9 board (row-wise). You can map this to numeric entries as necessary.
Here's an example state: `" X O XO OO X X X "`

## Using your own learning algorithm
Simply implement your learning model e.g. `MyLearningModel` by inheriting from `GenericLearning`. Then instantiate the provided reinforcement learning bot with an instance of this model:
```python
from ultimateboard import UTTTBoardDecision

class MyLearningModel(GenericLearning):
def getBoardStateValue(self, player, board, boardState):
# Your implementation here
return value

def learnFromMove(self, player, board, prevBoardState):
# Your implementation here
from ultimateboard import UTTTBoardDecision
from learning import GenericLearning
import random
from ultimateplayer import RLUTTTPlayer

class MyLearningModel(GenericLearning):
def getBoardStateValue(self, player, board, boardState):
# Your implementation here
value = random.uniform() # As an example (and a very poor one)
return value # Must be a numeric value

learningModel = MyLearningModel(UTTTBoardDecision)
learningPlayer = RLUTTTPlayer(learningModel)
def learnFromMove(self, player, board, prevBoardState):
# Your implementation here - learn some value for the previousBoardState
pass

learningModel = MyLearningModel(UTTTBoardDecision)
learningPlayer = RLUTTTPlayer(learningModel)
```

## Sequence of games
More often than not you will want to just play a sequence of games and observe the learning over time. Code samples for that have been provided and uses the `GameSequence` class
```python
learningPlayer = RLUTTTPlayer()
randomPlayer = RandomUTTTPlayer()
results = []
numberOfSetsOfGames = 40
for i in range(numberOfSetsOfGames):
games = GameSequence(100, learningPlayer, randomPlayer, BoardClass=UTTTBoard, BoardDecisionClass=UTTTBoardDecision)
results.append(games.playGamesAndGetWinPercent())
from ultimateplayer import RLUTTTPlayer, RandomUTTTPlayer
from game import GameSequence
from ultimateboard import UTTTBoard, UTTTBoardDecision

learningPlayer = RLUTTTPlayer()
randomPlayer = RandomUTTTPlayer()
results = []
numberOfSetsOfGames = 40
for i in range(numberOfSetsOfGames):
games = GameSequence(100, learningPlayer, randomPlayer, BoardClass=UTTTBoard, BoardDecisionClass=UTTTBoardDecision)
results.append(games.playGamesAndGetWinPercent())
```

## Prerequisites
Expand Down

0 comments on commit b788539

Please sign in to comment.