States with varying applicable actions #338

adubredu · 2021-03-12T16:31:05Z

adubredu
Mar 12, 2021

What's the best way to describe a POMDP with POMDPs.jl where only a subset of the possible actions can be valid for each state?

For example, in a pick and place domain with possible actions = {pick, place, push}, once the robot is holding an object, the only valid subset of actions it can take in that state is {place}.

Answered by zsunberg

Mar 12, 2021

POMDPs.actions has an optional state or belief argument.

function POMDPs.actions(m::PickNPlace, s)
    if holding(s)
        return [:place]
    else
        return [:pick, :place, :push]
    end
end

In a POMDP rather than an MDP, the second argument will actually be a belief instead of a state, but the robot should know with certainty whether it is holding something, so you should be able to get the relevant information from the belief. Most solvers should work with this, but a few may not, so you may have to add a penalty in the reward function to penalize invalid actions. Let us know if you have any follow-up questions.

View full answer

zsunberg · 2021-03-12T17:03:00Z

zsunberg
Mar 12, 2021
Maintainer

POMDPs.actions has an optional state or belief argument.

function POMDPs.actions(m::PickNPlace, s)
    if holding(s)
        return [:place]
    else
        return [:pick, :place, :push]
    end
end

In a POMDP rather than an MDP, the second argument will actually be a belief instead of a state, but the robot should know with certainty whether it is holding something, so you should be able to get the relevant information from the belief. Most solvers should work with this, but a few may not, so you may have to add a penalty in the reward function to penalize invalid actions. Let us know if you have any follow-up questions.

6 replies

DonalOCois Jan 19, 2023

Just wondering if someone knows if this also can be used with the QuickMDP method?
For me it can't infer the action. ie as below:
[by the way sorry if I should have made a new post for this. But I seen that there are two on this subject already]

[edit: for anyone reading this later, the "actions" should just be a single value, not two, as it is used to calculate "vp" in this case. The array of two values is just related to my specific project and I forgot to change it back]

mountaincar = QuickMDP(
function (s, a, rng)        
        x, v = s
        vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
        xp = x + vp
        if xp > 0.5
            r = 100.0
        else
            r = -1.0
        end
        return (sp=(xp, vp), r=r)
    end,
function POMDPs.actions(m::MDP, s)
    v = s[2]
	if v >= 0.0
        return [0., -1.]
    else
        return [0., 1.]
    end
end,
    initialstate = (-0.5, 0.0),
    discount = 0.95,
    isterminal = s -> s[1] > 0.5
)

or with a more simple approach, still doesn't infer the action space:

mountaincar = QuickMDP(
    function (s, a, rng)        
        x, v = s
        vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
        xp = x + vp
        if xp > 0.5
            r = 100.0
        else
            r = -1.0
        end
        return (sp=(xp, vp), r=r)
    end,

actions = function (s)
    v = s[2]
    if v >= 0.0
        return [0., -1.]
    else
        return [0., 1.]
    end
end,
	
    initialstate = (-0.5, 0.0),
    discount = 0.95,
    isterminal = s -> s[1] > 0.5
)

zsunberg Jan 20, 2023
Maintainer

It looks to me like your second snippet should do what you expect. Can you post the specific code that throws the error?

(perhaps you also want your action function to handle zero arguments by making s optional? i.e. actions = function (s=nothing))

DonalOCois Jan 20, 2023

Hi! thanks for answering so fast.
I only tried that exact snippet of code that I posted already so far, but there's no point to proceed further as it throws the error:
Unable to infer action type for a Quick(PO)MDP; using Any. This may have significant performance consequences. Use the actiontype keyword argument to specify a concrete action type.
I think you may be exactly right that the action function should handle zero arguments for s. I haven't come across the need for that before, so I'm not sure of the syntax here. Something like actions = function (s=nothing,(s)) I assume? This throws an error
syntax: optional positional arguments must occur at end around (C:\Users\...)
Could you write your suggested snippet explicitly if you get a chance?

zsunberg Jan 21, 2023
Maintainer

This should work:

mountaincar = QuickMDP(
           function (s, a, rng)        
               x, v = s
               vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
               xp = x + vp
               if xp > 0.5
                   r = 100.0
               else
                   r = -1.0
               end
               return (sp=(xp, vp), r=r)
           end,

           actions = function (s=(0.0, 0.0))
               v = s[2]
               if v >= 0.0
                   return [0., -1.]
               else
                   return [0., 1.]
               end
           end,
               
           initialstate = Deterministic((-0.5, 0.0)),
           discount = 0.95,
           isterminal = s -> s[1] > 0.5
       )

Note that I have specified a default value for the s argument in the actions anonymous function.

Note also that initialstate needs to be a distribution. Can you let me know where you found an example where initialstate was not a distribution? We need to find all of that old incorrect documentation.

DonalOCois Jan 22, 2023

That's great, it can infer the action space now!
This was the example I had been using of the mountaincar problem with state dependent actions (slightly further down the page): https://juliapomdp.github.io/QuickPOMDPs.jl/stable/quick/

Also, for anyone who comes this way in the future, don't forget to add using POMDPModelTools , as this will be required for the 'Distribution' to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

States with varying applicable actions #338

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

States with varying applicable actions #338

adubredu Mar 12, 2021

Replies: 1 comment · 6 replies

zsunberg Mar 12, 2021 Maintainer

DonalOCois Jan 19, 2023

zsunberg Jan 20, 2023 Maintainer

DonalOCois Jan 20, 2023

zsunberg Jan 21, 2023 Maintainer

DonalOCois Jan 22, 2023

adubredu
Mar 12, 2021

Replies: 1 comment 6 replies

zsunberg
Mar 12, 2021
Maintainer

zsunberg Jan 20, 2023
Maintainer

zsunberg Jan 21, 2023
Maintainer