Dimension error in PPO train() with multiple actions #212

basvanopheusden · 2019-10-03T20:40:35Z

I'm trying to train a PPO agent in an environment with a multi-dimensional action space, specifically, a 1x5 vector (floats in [0,10]), a 1x10 vector (floats in [0,10]) and an boolean (integer with min=0,max=1). I'm able to create an agent but during training, I receive this error:

InvalidArgumentError: Dimension 2 in both shapes must be equal, but are 5 and 1. Shapes are [1,450,5,1] and [1,450,1,1].
From merging shape 1 with other shapes. for 'epoch_0/AddN_1' (op: 'AddN') with input shapes: [1,450,10,1], [1,450,5,1], [1,450,1,1].

I've been able to trace back the error to happen within the function tf_agent.train(). Relevant code is here

# Collect a few episodes using collect_policy and save to the replay buffer.
collect_episode(replay_buffer, train_env, tf_agent.collect_policy, collect_episodes_per_iteration)
		
# Use data from the buffer and update the agent's network.
trajectories = replay_buffer.gather_all()
train_loss = tf_agent.train(experience=trajectories)
replay_buffer.clear()

The text was updated successfully, but these errors were encountered:

oars · 2019-10-04T16:08:56Z

Can you share the full specs from your environment? Also please run the environment validation to make sure your environment consistently generates data fitting the specs:

https://github.com/tensorflow/agents/blob/master/tf_agents/environments/utils.py#L45

basvanopheusden · 2019-10-04T16:43:51Z

Yes, both the observation and actions specs are multidimensional:

py_env = load_environment(params)
utils.validate_py_environment(py_env)
print('ObservationSpec:',py_env.observation_spec())
print('ActionSpec:',py_env.action_spec())

This code snippet yields the following output:

ObservationSpec: {'J': BoundedArraySpec(shape=(50,), dtype=dtype('float32'), name='observation', minimum=0.0, maximum=3.4028234663852886e+38), 'ftarget': ArraySpec(shape=(5,), dtype=dtype('float32'), name='observation'), 'F': ArraySpec(shape=(50,), dtype=dtype('float32'), name='observation')}
ActionSpec: [BoundedArraySpec(shape=(10, 1), dtype=dtype('float32'), name='action', minimum=0.0, maximum=10.0), BoundedArraySpec(shape=(5, 1), dtype=dtype('float32'), name='action', minimum=0.0, maximum=10.0), BoundedArraySpec(shape=(1, 1), dtype=dtype('int32'), name='action', minimum=0, maximum=1)]

basvanopheusden · 2019-10-08T21:35:32Z

Some extra information: the error can be traced back to line 586 in ppo_agent.py:

total_kl_penalty_loss = tf.add_n(kl_penalty_losses)

If I switch to the Reinforce algorithm, my agent does run, but training is slow

ageron mentioned this issue Nov 22, 2020

env.step(1) => AttributeError: 'int' object has no attribute 'item' #520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension error in PPO train() with multiple actions #212

Dimension error in PPO train() with multiple actions #212

basvanopheusden commented Oct 3, 2019

oars commented Oct 4, 2019

basvanopheusden commented Oct 4, 2019

basvanopheusden commented Oct 8, 2019

Dimension error in PPO train() with multiple actions #212

Dimension error in PPO train() with multiple actions #212

Comments

basvanopheusden commented Oct 3, 2019

oars commented Oct 4, 2019

basvanopheusden commented Oct 4, 2019

basvanopheusden commented Oct 8, 2019