You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm playing with a contextual bandit which has only one step for each episode. My goal is to save multiple Trajectory into replay buffer and use it later. However I'm not able to create a replay buffer that can be used to train the agent.
The agent spec is as below (running agent.collect_data_spec):
I'm not able to feed my Trajectory into the replay buffer when running replay_buffer.add_batch(my_traj). It returns the following error:
function_node __wrapped__ResourceScatterUpdate_device_/job:localhost/replica:0/task:0/device:CPU:0}} Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [1,1], indices.shape [1], params.shape [1] [Op:ResourceScatterUpdate]
My action is a scalar (action_spec = tensor_spec.BoundedTensorSpec(dtype=tf.int32, shape=(), minimum=0,maximum=narms-1)). The action tensor in Trajectory has shape of (1,1), which is (batch_size, time). Since each action is a scalar, it does not have additional dimensions. I suspect that's why my params.shape is [1] instead of a 2D array and the replay buffer throws an error. Has anyone successfully creates a replay buffer for contextual bandit before? Please advise. Thank you!
Another confusion: When a replay buffer is created with batch_size = 1, max_length = 1 (in the case of bandit), its capacity is also 1. Then how can I store multiple Trajectory to the replay buffer?
Thank you!
The text was updated successfully, but these errors were encountered:
I'm playing with a contextual bandit which has only one step for each episode. My goal is to save multiple Trajectory into replay buffer and use it later. However I'm not able to create a replay buffer that can be used to train the agent.
The agent spec is as below (running
agent.collect_data_spec
):My trajectory looks like below (it has been passed to agent.train() and can run successfully):
However, after creating a replay buffer....
I'm not able to feed my Trajectory into the replay buffer when running
replay_buffer.add_batch(my_traj)
. It returns the following error:function_node __wrapped__ResourceScatterUpdate_device_/job:localhost/replica:0/task:0/device:CPU:0}} Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [1,1], indices.shape [1], params.shape [1] [Op:ResourceScatterUpdate]
My action is a scalar (
action_spec = tensor_spec.BoundedTensorSpec(dtype=tf.int32, shape=(), minimum=0,maximum=narms-1)
). The action tensor in Trajectory has shape of (1,1), which is (batch_size, time). Since each action is a scalar, it does not have additional dimensions. I suspect that's why myparams.shape
is [1] instead of a 2D array and the replay buffer throws an error. Has anyone successfully creates a replay buffer for contextual bandit before? Please advise. Thank you!Another confusion: When a replay buffer is created with batch_size = 1, max_length = 1 (in the case of bandit), its capacity is also 1. Then how can I store multiple Trajectory to the replay buffer?
Thank you!
The text was updated successfully, but these errors were encountered: