State-dependent actions and gen function #382

Cbroomer · 2022-05-17T06:52:50Z

Cbroomer
May 17, 2022

Hello,

I am new in this field and I would like you to ask you 2 questions that right now are very important for me:

Is it possible to use state-dependent actions with these solvers (https://github.com/JuliaPOMDP/TabularTDLearning.jl) ? Maybe a function that given a state returns a vector with all the possible actions?

How does the generative model really works? I mean if I use gen = function (s, a, rnd) it takes the initial state and then tries all the possible actions of the actions space according to their keys? I want to know if actions are selected in order or are chosen randomly.

I hope you can help me, thank you so much.

Best regards.

Answered by lassepe

May 17, 2022

Hi @clrescobar,

Unfortunately, TabularTDLearning does not currently support state-dependent actions since that would make the table "non square". That said, it would be possible to make that extension by using another data structure for the Q-values in TabularTDLearning. That, however, would require minor changes to the solver/code over there.

The gen function implements the (PO)MDP transition (and observation) model in a generative representation. That is, rather than having to provide probability densities over states, observations, and rewards via T, Z, and R, you just have to implement gen as a function to sample from the joint distribution of the three. Therefore, when you call gen(s…

View full answer

lassepe · 2022-05-17T07:43:17Z

lassepe
May 17, 2022
Maintainer

Hi @clrescobar,

Unfortunately, TabularTDLearning does not currently support state-dependent actions since that would make the table "non square". That said, it would be possible to make that extension by using another data structure for the Q-values in TabularTDLearning. That, however, would require minor changes to the solver/code over there.

The gen function implements the (PO)MDP transition (and observation) model in a generative representation. That is, rather than having to provide probability densities over states, observations, and rewards via T, Z, and R, you just have to implement gen as a function to sample from the joint distribution of the three. Therefore, when you call gen(s, a, rng), you will get a random sample (;sp, o, r) of a new state sp, observation o, and reward r conditioned on your previous state s and action a. If your problem is non-deterministic and you repeatedly call this function, repeated calls to this function may yield different (; sp, o, r) tuples. However, the distribution of those samples will be consistent with the POMDP model (by definition, has the gen function implicitly defines this distribution).

A quick side note: In practice you should never have to call gen yourself. Instead, you would use the @gen macro which combines generative and explicit models into a single sampling call.

23 replies

Cbroomer May 20, 2022
Author

There are terminal states and the dummy state which is also terminal. So I concatenate terminal and dummy states as terminal and then used termination:

terminal = vcat(terminal,spt)
termination(s::State) =  s in terminal

.....
    states = SS, #State-space
    actions = AS, #Action-space
    initialstate = init_resupply, #Initial state
    discount = gamma, #Discount factor
    isterminal = termination) #Termination: Dummy or terminal state

zsunberg May 21, 2022
Maintainer

It's hard to know the exact source of the performance difference from just a small snippet of code, but it may be because the sp=s version of the gen function is "type stable" while the sp=spt version is not (see https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions). Additionally, if spt is a global variable, it might cause extra slow performance: https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables. In general, skimming the "Performance Tips" section of the Julia manual is really helpful in understanding why code is faster or slower.

Cbroomer May 21, 2022
Author

I mean faster in number of iterations, I created spt from struct State and s is also State, so I guess this is because spt is a global variable.
Thank you so much for your help.

zsunberg May 21, 2022
Maintainer

I mean faster in number of iterations,

Oh, that is more interesting! It makes sense though: When the agent tries a prohibited action with sp=spt, it has to start a completely new episode. With sp=s, it gets to keep going on the same episode. It's like having to start over completely on a video game when the character dies vs having multiple lives - things are much easier with multiple lives.

Cbroomer May 21, 2022
Author

Perfect, thank you so much for all you help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State-dependent actions and gen function #382

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 23 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

State-dependent actions and gen function #382

Cbroomer May 17, 2022

Replies: 1 comment · 23 replies

lassepe May 17, 2022 Maintainer

Cbroomer May 20, 2022 Author

zsunberg May 21, 2022 Maintainer

Cbroomer May 21, 2022 Author

zsunberg May 21, 2022 Maintainer

Cbroomer May 21, 2022 Author

Cbroomer
May 17, 2022

Replies: 1 comment 23 replies

lassepe
May 17, 2022
Maintainer

Cbroomer May 20, 2022
Author

zsunberg May 21, 2022
Maintainer

Cbroomer May 21, 2022
Author

zsunberg May 21, 2022
Maintainer

Cbroomer May 21, 2022
Author