You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! While running a POMCP problem, I'm sometimes getting particle deprivation at the end of a trial. It is the same problem as in #32 and #27, so it is a finite-horizon problem that has N states and observations, with N+1 possible actions (i.e. one action per state and the 'wait' action).
Since real observations are coming from data, the observation model is constructing by predicting from this data with a decoder to obtain $p(o|s')$ (the action does not change the observation function as of now). The trial ends when any action other than 'wait' is taken or when the last time-step of a trial is reached without any action. When this happens, a new trial starts, and the transition probability is uniform to all the possible states.
TRIAL 20 (true state s_6-t_0)
--------------------
STEP 0 (6 steps remaining)
Current belief (based on 0s of data):
s_0-t_0 -> 0.08
s_1-t_0 -> 0.083
s_2-t_0 -> 0.079
s_3-t_0 -> 0.09
s_4-t_0 -> 0.087
s_5-t_0 -> 0.091
s_6-t_0 -> 0.083
s_7-t_0 -> 0.079
s_8-t_0 -> 0.078
s_9-t_0 -> 0.079
s_10-t_0 -> 0.091
s_11-t_0 -> 0.08
Action: a_wait
Reward: -1.0. Transition to s_6-t_1
Observation: o_6
Particle reinvigoration for 442 particles
STEP 1 (5 steps remaining)
Current belief (based on 0s of data):
s_0-t_1 -> 0.023
s_1-t_1 -> 0.039
s_2-t_1 -> 0.024
s_3-t_1 -> 0.027
s_4-t_1 -> 0.033
s_5-t_1 -> 0.024
s_6-t_1 -> 0.677
s_7-t_1 -> 0.033
s_8-t_1 -> 0.019
s_9-t_1 -> 0.052
s_10-t_1 -> 0.025
s_11-t_1 -> 0.024
Action: a_wait
Reward: -1.0. Transition to s_6-t_2
Observation: o_6
Particle reinvigoration for 670 particles
STEP 2 (4 steps remaining)
Current belief (based on 0s of data):
s_0-t_2 -> 0.004
s_3-t_2 -> 0.002
s_4-t_2 -> 0.005
s_6-t_2 -> 0.977
s_7-t_2 -> 0.003
s_9-t_2 -> 0.006
s_10-t_2 -> 0.003
Action: a_wait
Reward: -1.0. Transition to s_6-t_3
Observation: o_6
Particle reinvigoration for 472 particles
STEP 3 (3 steps remaining)
Current belief (based on 0s of data):
s_6-t_3 -> 1.0
Action: a_6
Reward: 10.0. Transition to s_10-t_0
Particle reinvigoration for 920 particles
This makes sense to me as the code is sampling an observation at random instead of getting it from data when the trial is going to end. Given this, I can see why usually it gets an observation that has not being simulated a lot in the tree. I get particle deprivation sometimes, so I guess it depends on how the simulation goes and which observations are sampled.
Since trials are independent (i.e., I am creating a new instance of the problem for every trial and resetting the belief), I wonder if it makes sense to update the belief when the trial ends at all. Not doing it would avoid this issue with particle deprivation, if I understood correctly. Even so, I am wondering whether this method of providing observations during the last time step is correct. To clarify, the observation model currently looks like this:
classTDObservationModel(ObservationModel):
""" Time-dependent extension of the ObservationModel class that takes into account the time-step within each POMDP trial and allows the observation function to have different observation probabilities depending on the time step. This allows the time-dependent POMDP to leverage the fact that the more a trial advances, the more brain data from the subject is available. Thus, the probability p(o | s, a, d) (where d is the time step) should be less uncertain the longer the trial is. This also removes the contraint present in the basic model where the initial time step of each trial needs a sufficiently large brain data window to yield good classification (e.g. 0.5s), since restrictions on the previous observation function required all time steps to use data windows of the same length Parameters ---------- Features: 3-D np.array, shape (n_steps, n_states, n_observations) Feature array for the observation matrix. Attributes ---------- discretization: str, ['conf_matrix'] Method used to define the observation model. Value 'conf_matrix' uses a 3D confusion matrix obtained from stacking the confusion matrix of the decoding algorithm using a given brain data window length. observation_matrix: 3D np.array, (n_timesteps, n_class, n_observation) Matrix representing the observation model, where each element represents the probability of obtaining the observation corresponding on the third dimension given that the agent is currently at the state corresponding to the second simension and the current time step of the trial is that of the first dimension. Example: observation_matrix[3][2][5] = p(o=o_5|s=s_2, d=3) """def__init__(self, features, discretization='conf_matrix'):
self.discretization=discretizationself.observation_matrix=self._make_obs_matrix(features)
self.n_steps, self.n_states, self.n_obs=self.observation_matrix.shapedefprobability(self, observation, next_state, action):
# Probability of obtaining a new observation given the next state and the time stepifnext_state.t==0: # If next_state.t is 0, either state was last step or action was decision return1/self.n_stateselse: # Wait action on other time stepsobs_idx=observation.idstate_idx=next_state.idstate_step=next_state.t-1# observation_matrix[0] corresponds to when next_state.t is 1returnself.observation_matrix[state_step][state_idx][obs_idx]
defsample(self, next_state, action):
# Return a random observation according to the probabilities given by the confusion matrixifnext_state.t==0: # If next_state.t is 0, either state was last step or action was decision returnnp.random.choice(self.get_all_observations())
else: # Wait action on other time stepsstate_idx=next_state.idstate_step=next_state.t-1# observation_matrix[0] corresponds to when next_state.t is 1obs_p=self.observation_matrix[state_step][state_idx]
returnnp.random.choice(self.get_all_observations(), p=obs_p)
The text was updated successfully, but these errors were encountered:
I had a look and saw that all trials have a much higher count for particle reinvigoration at the last time step compared to others
Having high particle reinvigoration could be totally normal. It's a different issue from particle deprivation. If the number keeps growing, the suggests somehow the true observation is less predictable from the current particles. That might be a property of the problem itself.
Since trials are independent (i.e., I am creating a new instance of the problem for every trial and resetting the belief), I wonder if it makes sense to update the belief when the trial ends at all.
Since trials are independent, you only need to make sure the belief is correct at the start of a trial.
If that were the case, what can I do to avoid this occasional particle deprivation? I can circunvent it by just avoiding the final update, but I am curious about potential solutions
Since trials are independent, you only need to make sure the belief is correct at the start of a trial.
Hello! While running a POMCP problem, I'm sometimes getting particle deprivation at the end of a trial. It is the same problem as in #32 and #27, so it is a finite-horizon problem that has N states and observations, with N+1 possible actions (i.e. one action per state and the 'wait' action).
Since real observations are coming from data, the observation model is constructing by predicting from this data with a decoder to obtain$p(o|s')$ (the action does not change the observation function as of now). The trial ends when any action other than 'wait' is taken or when the last time-step of a trial is reached without any action. When this happens, a new trial starts, and the transition probability is uniform to all the possible states.
I noticed sometimes I get particle deprivation right after the trial ends. After looking at the code, I saw that particle deprivation can happen if there is an observation that was never anticipated in the tree . I had a look and saw that all trials have a much higher count for particle reinvigoration at the last time step compared to others (n_particles = 1000), for example:
This makes sense to me as the code is sampling an observation at random instead of getting it from data when the trial is going to end. Given this, I can see why usually it gets an observation that has not being simulated a lot in the tree. I get particle deprivation sometimes, so I guess it depends on how the simulation goes and which observations are sampled.
Since trials are independent (i.e., I am creating a new instance of the problem for every trial and resetting the belief), I wonder if it makes sense to update the belief when the trial ends at all. Not doing it would avoid this issue with particle deprivation, if I understood correctly. Even so, I am wondering whether this method of providing observations during the last time step is correct. To clarify, the observation model currently looks like this:
The text was updated successfully, but these errors were encountered: