Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension error in PPO train() with multiple actions #212

Open
basvanopheusden opened this issue Oct 3, 2019 · 3 comments
Open

Dimension error in PPO train() with multiple actions #212

basvanopheusden opened this issue Oct 3, 2019 · 3 comments

Comments

@basvanopheusden
Copy link

I'm trying to train a PPO agent in an environment with a multi-dimensional action space, specifically, a 1x5 vector (floats in [0,10]), a 1x10 vector (floats in [0,10]) and an boolean (integer with min=0,max=1). I'm able to create an agent but during training, I receive this error:

InvalidArgumentError: Dimension 2 in both shapes must be equal, but are 5 and 1. Shapes are [1,450,5,1] and [1,450,1,1].
From merging shape 1 with other shapes. for 'epoch_0/AddN_1' (op: 'AddN') with input shapes: [1,450,10,1], [1,450,5,1], [1,450,1,1].

I've been able to trace back the error to happen within the function tf_agent.train(). Relevant code is here

# Collect a few episodes using collect_policy and save to the replay buffer.
collect_episode(replay_buffer, train_env, tf_agent.collect_policy, collect_episodes_per_iteration)
		
# Use data from the buffer and update the agent's network.
trajectories = replay_buffer.gather_all()
train_loss = tf_agent.train(experience=trajectories)
replay_buffer.clear()
@oars
Copy link
Contributor

oars commented Oct 4, 2019

Can you share the full specs from your environment? Also please run the environment validation to make sure your environment consistently generates data fitting the specs:

https://github.com/tensorflow/agents/blob/master/tf_agents/environments/utils.py#L45

@basvanopheusden
Copy link
Author

Yes, both the observation and actions specs are multidimensional:

py_env = load_environment(params)
utils.validate_py_environment(py_env)
print('ObservationSpec:',py_env.observation_spec())
print('ActionSpec:',py_env.action_spec())

This code snippet yields the following output:

ObservationSpec: {'J': BoundedArraySpec(shape=(50,), dtype=dtype('float32'), name='observation', minimum=0.0, maximum=3.4028234663852886e+38), 'ftarget': ArraySpec(shape=(5,), dtype=dtype('float32'), name='observation'), 'F': ArraySpec(shape=(50,), dtype=dtype('float32'), name='observation')}
ActionSpec: [BoundedArraySpec(shape=(10, 1), dtype=dtype('float32'), name='action', minimum=0.0, maximum=10.0), BoundedArraySpec(shape=(5, 1), dtype=dtype('float32'), name='action', minimum=0.0, maximum=10.0), BoundedArraySpec(shape=(1, 1), dtype=dtype('int32'), name='action', minimum=0, maximum=1)]

@basvanopheusden
Copy link
Author

Some extra information: the error can be traced back to line 586 in ppo_agent.py:

total_kl_penalty_loss = tf.add_n(kl_penalty_losses)

If I switch to the Reinforce algorithm, my agent does run, but training is slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants