Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. #198

Open
j0rd1smit opened this issue Aug 29, 2019 · 3 comments

Comments

@j0rd1smit
Copy link

Using the add_batch method from PyUniformReplayBuffer throws an IndexError: tuple index out of range. When you do something simliar with an TFEnv all the trajectories are batched but this is not the case in PyEnv. I think that is why the observer lambda x: buffer.add_batch(batch_nested_array(x)) works and the observer buffer.add_batch doesn't. Bellow are some examples codes.

This doesn't work:

env = suite_gym.load('CartPole-v0')  
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [buffer.add_batch]  
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)  
  
initial_time_step = env.reset()  
final_time_step, _ = driver.run(initial_time_step)

This works:

env = suite_gym.load('CartPole-v0')  
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [lambda x: buffer.add_batch(batch_nested_array(x))] #Batched the traj
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)  
  
initial_time_step = env.reset()  
final_time_step, _ = driver.run(initial_time_step)

I think the best way to solve this is batching the trajactory in PyDriver to stay consistent with the TfDriver. This would only result into a single change in the .run() method from PyDriver.
original:

traj = trajectory.from_transition(time_step, action_step, next_time_step)

Proposed replacement:

traj = trajectory.from_transition(time_step, action_step, next_time_step)  
traj = nest_utils.batch_nested_array(traj)
@sguada
Copy link
Member

sguada commented Aug 29, 2019

To use add_batch the data should be batched, i.e. the environment should be batched. This should work:

env = batched_py_environment.BatchedPyEnvironment([suite_gym.load('CartPole-v0')])
policy = random_py_policy.RandomPyPolicy(time_step_spec=env.time_step_spec(),action_spec=env.action_spec())  
buffer = PyUniformReplayBuffer(policy.trajectory_spec, 1000)  
  
observers = [buffer.add_batch]  
driver = py_driver.PyDriver(env, policy, observers, max_steps=1, max_episodes=1)  

@j0rd1smit
Copy link
Author

j0rd1smit commented Aug 29, 2019

Yes sure this would work but my issue is about the inconsistency in the API. For a TFEnv (batches by default) it is not needed but it is needed for a PyEnv (doesn't batch by default), this is quite confusing. Personally this took me quite a while to understand what cause this issue.

@Jaffeur
Copy link

Jaffeur commented Oct 15, 2020

I agree with @j0rd1smit. I stumbled upon the same error, it is very confusing to have different behaviours between the Py and TF version of the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants