Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO policy with ActorDistributionNetwork and discrete action array #656

Open
cedavidyang opened this issue Sep 7, 2021 · 13 comments
Open

Comments

@cedavidyang
Copy link

I'm using PPOAgent and ActorDistributionNetwork with the following action_spec:

action_spec = array_spec.BoundedArraySpec(
    shape=(10,), dtype=np.int32, minimum=0, maximum=4, name='action')

However, I received the following error when the agent was trying to initialize a PPOPolicy

ValueError: actor_network output spec does not match action spec

The issue arises when executing the following check in ppo_policy.py file (near line 112):

    distribution_utils.assert_specs_are_compatible(
        actor_output_spec, action_spec,
        'actor_network output spec does not match action spec')

The actor_output_spec of ActorDistributionNetwork is an event_spec with shape (), which is inconsistent with action_spec. A similar issue has been report in #548.

My workaround has been to comment out the specs compatibility block in the ppo_policy.py file. After doing that, the code can be run successfully, and the agent is able to learn. But I'm not sure if this is a bug, or am I missing anything?

@summer-yue
Copy link
Member

Thank you for reporting. It's a little hard to know exactly what's going on. Could you help print out both action_output_spec and action_spec so we know why it doesn't match?

@cedavidyang
Copy link
Author

Thanks for following up. My action_spec is

BoundedTensorSpec(
    shape=(10,),
    dtype=tf.int32,
    name='action',
    minimum=array(0, dtype=int32),
    maximum=array(3, dtype=int32))

And the actor_output_spec is

<DistributionSpecV2: event_shape=(), dtype=<dtype: 'int32'>,
parameters=<Params: type=<class 'tensorflow_probability.python.distributions.categorical.Categorical'>,
params={'logits': TensorSpec(shape=(10, 4), dtype=tf.float32, name=None)}>>

I believe the error is because by default, Categorical distribution in TensorFlow Probability has an empty event_shape=(), which is not consistent with shape=(10,0) defined in action_spec

@summer-yue
Copy link
Member

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np
import tensorflow as tf
from tf_agents.specs import array_spec, tensor_spec
from tf_agents.trajectories import time_step as ts
from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent
from tf_agents.networks import value_network
from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action')
observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec)
observation_tensor_spec = tensor_spec.from_spec(observation_spec)
time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork()
actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent(
time_step_tensor_spec,
action_tensor_spec,
actor_net=actor_net,
value_net=value_net)

@lonaeyeo
Copy link

Hi, I received the same error when using ppoAgent,. I tried your code, but i still received the error.
I tried that on two versions of tf-agents.
version 1:
tf-agents: 0.8.0
tensorflow: 2.5
python: 3.8

version 2:
tf-agents: 0.9.0
tensorflow: 2.6
python: 3.7

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

@sibyjackgrove
Copy link

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

@summer-yue @lonaeyeo The above code results in the following error:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.int32, name=None)
vs.
BoundedTensorSpec(shape=(10,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(4, dtype=int32))

To make it work we need to change action_spec to be a scalar.
action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.int32, minimum=0, maximum=4, name='action')

@TheGreatRambler
Copy link

Any updates on this? I would prefer to be able to use a discrete action space rather than being stuck to a scalar, and I've heard commenting out the check creates other problems

@sibyjackgrove
Copy link

@TheGreatRambler Perhaps I should have been more clear. You can use Discrete action space. What I meant was that the action has to be a single integer value and not a vector of integer values.

@TheGreatRambler
Copy link

Oh sorry, I'm a bit new to terminology. I need to be able to use a vector of integer values. I commented out the check and the agent appears to be running, but is it actually learning?

@TheGreatRambler
Copy link

TheGreatRambler commented Jan 23, 2022

The spec in question. The first 2 integers are joystick axis, the last 3 are booleans where 0 to 32767 is false and above is true.

self._action_spec = array_spec.BoundedArraySpec(shape=(5,), dtype=np.int32, minimum=0, maximum=65535, name='action')

@sibyjackgrove
Copy link

@TheGreatRambler I don't think the above spec will work if you have a mixture of booleans and integers. Joystick axis positions would be continuous actions. I think you will need two actor networks, one for the boolean values and one for the continuous integers.

@sibyjackgrove
Copy link

@cedavidyang Were you able to figure out a fix for this error?

@sibyjackgrove
Copy link

@summer-yue It seems the only way around this is to comment out the following from lines 11 to 13 in ppo_policy.py

distribution_utils.assert_specs_are_compatible(
actor_output_spec, action_spec,
'actor_network output spec does not match action spec')

@aosama
Copy link

aosama commented Feb 6, 2024

I encountered the same issue and had to comment out the lines above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants