Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

j0rd1smit · 2019-08-29T12:59:34Z

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. When you do something simliar with an TFEnv all the trajectories are batched but this is not the case in PyEnv. I think that is why the observer lambda x: buffer.add_batch(batch_nested_array(x)) works and the observer buffer.add_batch doesn't. Bellow are some examples codes.

This doesn't work:

env = suite_gym.load('CartPole-v0')  
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [buffer.add_batch]  
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)  
  
initial_time_step = env.reset()  
final_time_step, _ = driver.run(initial_time_step)

This works:

env = suite_gym.load('CartPole-v0')  
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [lambda x: buffer.add_batch(batch_nested_array(x))] #Batched the traj
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)  
  
initial_time_step = env.reset()  
final_time_step, _ = driver.run(initial_time_step)

I think the best way to solve this is batching the trajactory in PyDriver to stay consistent with the TfDriver. This would only result into a single change in the .run() method from PyDriver.
original:

traj = trajectory.from_transition(time_step, action_step, next_time_step)

Proposed replacement:

traj = trajectory.from_transition(time_step, action_step, next_time_step)  
traj = nest_utils.batch_nested_array(traj)

The text was updated successfully, but these errors were encountered:

sguada · 2019-08-29T18:51:29Z

To use add_batch the data should be batched, i.e. the environment should be batched. This should work:

env = batched_py_environment.BatchedPyEnvironment([suite_gym.load('CartPole-v0')])
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [buffer.add_batch]  
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)

j0rd1smit · 2019-08-29T19:35:20Z

Yes sure this would work but my issue is about the inconsistency in the API. For a TFEnv (batches by default) it is not needed but it is needed for a PyEnv (doesn't batch by default), this is quite confusing. Personally this took me quite a while to understand what cause this issue.

Jaffeur · 2020-10-15T10:54:13Z

I agree with @j0rd1smit. I stumbled upon the same error, it is very confusing to have different behaviours between the Py and TF version of the class.

j0rd1smit mentioned this issue Aug 29, 2019

Fixed: add_batch method observer on PyUniformReplayBuffer bug #198 #200

Closed

Sch-Stef mentioned this issue Nov 21, 2022

Actor/Learner DQN Pong: Using PyUniformReplayBuffer instead of ReverbReplayBuffer not possible #797

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

j0rd1smit commented Aug 29, 2019

sguada commented Aug 29, 2019

j0rd1smit commented Aug 29, 2019 •

edited

Loading

Jaffeur commented Oct 15, 2020

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

Comments

j0rd1smit commented Aug 29, 2019

sguada commented Aug 29, 2019

j0rd1smit commented Aug 29, 2019 • edited Loading

Jaffeur commented Oct 15, 2020

j0rd1smit commented Aug 29, 2019 •

edited

Loading