Defining Exo-MDPs with External Data #564

kpa28-git · 2024-11-08T00:47:04Z

kpa28-git
Nov 8, 2024

I'm trying to define an Exo-MDP (Markov Decision Process with Exogenous Inputs). The Exo-MDP is defined by two assumptions:

its state can be factorized into an endogenous (influenceable by the agent) and exogenous (not influenceable by the agent) component
the only unknowns are the distribution of future exogenous state.

To begin, I'm trying to implement a purely exogenous MDP (no endogenous state).

Here is the part I'm stuck on. In my case I want to train a policy using exogenous state transitions sampled from historical/realized data. In theory this should be possible, but I'm not sure if POMDPs.jl is the right tool for this.

The practical concern is how to inject external data. Should I use a Channel in the MDP object? Should I use global, outer scoped variables, or closures? Are any of these bad or suggested ideas for working with solvers?

Another way would be to somehow use a solver directly with sampled trajectories. This would be the cleanest way, but I don't think POMDPs is built to work like this.

I guess my biggest question is if POMDPs is fundamentally not designed to work the kind of problem I'm trying to define.

Thanks.

Answered by kpa28-git

Nov 11, 2024

For reference, I got a minimal example working with closures, heap allocated data, and @eval (with MCTS). It is straightforward. Basically, you share data using closures for each part of the interface (eg initialstate, transition, ...) that requires exogenous state, using heap allocated data for communication between methods if necessary. I'm not using any multithreading right now for simulations or solvers so communicating this way should be fine.

The MDP I implemented was not actually a pure Exo-MDP because there was some other external data also needed during the reward calculation. This made it a little more complicated. If you just needed to sample the initial state from external dat…

View full answer

kpa28-git · 2024-11-11T03:28:42Z

kpa28-git
Nov 11, 2024
Author

For reference, I got a minimal example working with closures, heap allocated data, and @eval (with MCTS). It is straightforward. Basically, you share data using closures for each part of the interface (eg initialstate, transition, ...) that requires exogenous state, using heap allocated data for communication between methods if necessary. I'm not using any multithreading right now for simulations or solvers so communicating this way should be fine.

The MDP I implemented was not actually a pure Exo-MDP because there was some other external data also needed during the reward calculation. This made it a little more complicated. If you just needed to sample the initial state from external data (no other exogenous state requirement), using channels would probably be best.

If anyone has ideas or comments, they're appreciated.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining Exo-MDPs with External Data #564

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Defining Exo-MDPs with External Data #564

kpa28-git Nov 8, 2024

Replies: 1 comment

kpa28-git Nov 11, 2024 Author

kpa28-git
Nov 8, 2024

kpa28-git
Nov 11, 2024
Author