-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathQlearn.lhs
119 lines (91 loc) · 4.87 KB
/
Qlearn.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
Qlearn.lhs
Cody Shepherd
> module Qlearn where
> import Learning
> import Board
> import Data.List
> import Control.Monad
> import System.Random
> import Data.Map (Map)
> import qualified Data.Map as Map
The key piece in this project that eluded my during my last attempt was
being able to "loop" a given number of times, repeating a computation and
accumulating and updated model.
In the definition of this algorithm we define an episode of 200 training
steps during which the Qmatrix model is updated and accumulated; at the end
of the episode the board is thrown out but the Qmatrix is kept for further
training.
The Kleisli composition operator is going to facilitate this for us.
First we need some constants in order to keep function parameters minimal.
Ultimately it would be nice to specify these at runtime from a terminal, but
that's a project for another day.
> n_steps = 200
> n_episodes = 5000
To facilitate the above function we need to be able to randomly generate a
starting board
dims: dimensions of board (excluding wall border)
p: % of cells with cans at start
> randBoard :: (Int, Int) -> Double -> IO Board
> randBoard dims p = do let n = floor $ fromIntegral (fst dims) * fromIntegral (snd dims) * p
> cs <- randCans dims n
> r <- randRob dims
> return (Board(dims, cs, r))
dims: dimensions of board (excluding wall border)
n: number of cans desired
> randCans :: (Int, Int) -> Int -> IO [Can]
> randCans dims 0 = return []
> randCans dims n = do i <- randomRIO(0, (fst dims)-1)
> j <- randomRIO(0, (snd dims)-1)
> let c = Can(i, j)
> cs <- randCans dims (n-1)
> return $ nub (c:cs)
dims: dimensions of board (excluding wall border)
> randRob :: (Int, Int) -> IO Rob
> randRob dims = do i <- randomRIO(0, (fst dims)-1)
> j <- randomRIO(0, (snd dims)-1)
> let r = Rob(i, j)
> return r
The episode function conducts a training episode. It uses the Kleisli
operator to pass the results of each training step from one step to
the next, and come out at the end with the final accumulated result.
dims: dimensions of board (excluding wall border)
p: % of cells with cans at start
n: number of steps in episode
eps: epsilon value
q: zero-initialized Qmatrix
i: episode number
r: The rewards list from the previous call to episode, to be extended
> episode :: (Int, Int) -> Double -> Int -> Double -> (Qmatrix, Int, [Double]) -> IO (Qmatrix, Int, [Double])
> episode dims p n eps (q, i, r) = do b <- randBoard dims p
> let eps' = if i `mod` 50 == 0 && eps > 0.1 then eps - 0.01 else eps
> (q', b') <- foldr1 (>=>) (replicate n (train eps')) (q,b)
> --print ("episode" ++ show i)
> let r' = sum $ map sum (Map.elems q')
> return (q', i+1, r':r)
doTraining conducts the entire training phase by folding the Kleisli
operator over episodes.
dims: dimensions of board (excluding wall border)
p: % of cells with cans at start
eps: initial epsilon value, to be annealed
> doTraining :: (Int, Int) -> Double -> Double -> IO (Qmatrix, [Double])
> doTraining dims p eps = do (qfinal, i, rs) <- foldr1 (>=>) (replicate n_episodes (episode dims p n_steps eps)) (newQTable,1, [0])
> return (qfinal, reverse rs)
With training conquered, our next task is to conduct testing, wherein the robot makes a
guess but does not update its Q matrix.
These functions work in the same way as the training functions.
dims: the board dimensions, as above
p: the percentage of cans on the board
n: the number of test episodes to run
eps: the epsilon value (remains constant during testing)
q: the trained Q Table
r: The rewards list from the last call to testEpisode, to be extended
> testEpisode :: (Int, Int) -> Double -> Int -> Double -> (Qmatrix, [Double]) -> IO (Qmatrix, [Double])
> testEpisode dims p n eps (q, r) = do b <- randBoard dims p
> (q', b', r') <- foldr1 (>=>) (replicate n (test eps)) (q,b,0.0)
> --print ("testEpisode" ++ show i)
> return (q', (r':r))
>
The parameters for this function are the same as above.
> doTesting :: (Int, Int) -> Double -> Double -> Qmatrix -> IO (Qmatrix, [Double])
> doTesting dims p eps q = do (q', r) <- foldr1 (>=>) (replicate n_episodes (testEpisode dims p n_steps eps)) (q, [0.0])
> return (q', reverse r)