Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #336
Replies: 3 comments 15 replies
-
Going to make this a discussion - hope we can provide some help! |
Beta Was this translation helpful? Give feedback.
-
I haven't had time to think on this in detail; I might have some cycles later this week. But your larger question at the end reminds me vaguely of a use-case I had looked into a couple of years ago |
Beta Was this translation helpful? Give feedback.
-
I suspect that the main problem here is that "clicked" events are recorded in the global A fairly common solution in such situation is to have the "clicked" event be part of the state. That is, the state |
Beta Was this translation helpful? Give feedback.
-
Hi,
Fantastic job with this library. Looks really nice. I am trying to implement a POMDP problem that is pretty much a scaled down version of the battleship problem in the original Silver & Veness POMCP paper (https://papers.nips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html).
Each state in the state space is a 3x3 grid (or an array of 9 numbers) of one's and zeros that denote the location of the battleships, where 1 are "hit" tiles and 0 are "miss" tiles. There are 9 actions corresponding to each of the tiles in the 3x3 grid. There are two observations, "hit" or "miss."
This should be a simple POMDP to implement, but the catch is that the agent cannot click on a tile twice until the state has changed (i.e. as long as the agent is in state s, each action can only be chosen once). I've implemented this as a field of the POMDP struct called "board" which is an array of -1 (tile not chosen by agent yet), 0 (tile chosen but miss), and 1 (tile chosen and hit). I am using the
gen
function to implement the transition so that I can generate the next state, observation, and reward simultaneously. In the gen function, once the agent has gotten all hit tiles, the board is re-cleared into -1's and the state changes. The code is pretty short and can be seen below.I tried using BasicPOMCP on this problem, but I am noticing that the board isnt getting cleared across state changes. This is probably because of the rather "hack-y" way I have implemented checking if an agent has taken a specific action or not.
What would be the best way to implement this problem? I briefly considered making the entire board the observation, but I am not sure if I am able to pass the current observation into the gen function, only the state and action...
The larger question here would be what is the best way to implement constraints on the action space of the agent based on the current history/observation?
Thanks for the help in advance!
Beta Was this translation helpful? Give feedback.
All reactions