PPO policy with ActorDistributionNetwork and discrete action array #656

cedavidyang · 2021-09-07T23:34:27Z

I'm using PPOAgent and ActorDistributionNetwork with the following action_spec:

action_spec = array_spec.BoundedArraySpec(
    shape=(10,), dtype=np.int32, minimum=0, maximum=4, name='action')

However, I received the following error when the agent was trying to initialize a PPOPolicy

ValueError: actor_network output spec does not match action spec

The issue arises when executing the following check in ppo_policy.py file (near line 112):

    distribution_utils.assert_specs_are_compatible(
        actor_output_spec, action_spec,
        'actor_network output spec does not match action spec')

The actor_output_spec of ActorDistributionNetwork is an event_spec with shape (), which is inconsistent with action_spec. A similar issue has been report in #548.

My workaround has been to comment out the specs compatibility block in the ppo_policy.py file. After doing that, the code can be run successfully, and the agent is able to learn. But I'm not sure if this is a bug, or am I missing anything?

The text was updated successfully, but these errors were encountered:

summer-yue · 2021-10-07T23:54:52Z

Thank you for reporting. It's a little hard to know exactly what's going on. Could you help print out both action_output_spec and action_spec so we know why it doesn't match?

cedavidyang · 2021-10-08T00:14:15Z

Thanks for following up. My action_spec is

BoundedTensorSpec(
    shape=(10,),
    dtype=tf.int32,
    name='action',
    minimum=array(0, dtype=int32),
    maximum=array(3, dtype=int32))

And the actor_output_spec is

<DistributionSpecV2: event_shape=(), dtype=<dtype: 'int32'>,
parameters=<Params: type=<class 'tensorflow_probability.python.distributions.categorical.Categorical'>,
params={'logits': TensorSpec(shape=(10, 4), dtype=tf.float32, name=None)}>>

I believe the error is because by default, Categorical distribution in TensorFlow Probability has an empty event_shape=(), which is not consistent with shape=(10,0) defined in action_spec

summer-yue · 2021-10-08T23:01:49Z

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np
import tensorflow as tf
from tf_agents.specs import array_spec, tensor_spec
from tf_agents.trajectories import time_step as ts
from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent
from tf_agents.networks import value_network
from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action')
observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec)
observation_tensor_spec = tensor_spec.from_spec(observation_spec)
time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork()
actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent(
time_step_tensor_spec,
action_tensor_spec,
actor_net=actor_net,
value_net=value_net)

lonaeyeo · 2021-11-14T17:04:16Z

Hi, I received the same error when using ppoAgent,. I tried your code, but i still received the error.
I tried that on two versions of tf-agents.
version 1:
tf-agents: 0.8.0
tensorflow: 2.5
python: 3.8

version 2:
tf-agents: 0.9.0
tensorflow: 2.6
python: 3.7

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

sibyjackgrove · 2022-01-20T16:19:20Z

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

@summer-yue @lonaeyeo The above code results in the following error:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.int32, name=None)
vs.
BoundedTensorSpec(shape=(10,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(4, dtype=int32))

To make it work we need to change action_spec to be a scalar.
action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.int32, minimum=0, maximum=4, name='action')

TheGreatRambler · 2022-01-21T18:58:18Z

Any updates on this? I would prefer to be able to use a discrete action space rather than being stuck to a scalar, and I've heard commenting out the check creates other problems

sibyjackgrove · 2022-01-21T23:09:02Z

@TheGreatRambler Perhaps I should have been more clear. You can use Discrete action space. What I meant was that the action has to be a single integer value and not a vector of integer values.

TheGreatRambler · 2022-01-23T20:40:45Z

Oh sorry, I'm a bit new to terminology. I need to be able to use a vector of integer values. I commented out the check and the agent appears to be running, but is it actually learning?

TheGreatRambler · 2022-01-23T21:04:39Z

The spec in question. The first 2 integers are joystick axis, the last 3 are booleans where 0 to 32767 is false and above is true.

self._action_spec = array_spec.BoundedArraySpec(shape=(5,), dtype=np.int32, minimum=0, maximum=65535, name='action')

sibyjackgrove · 2022-01-24T21:20:13Z

@TheGreatRambler I don't think the above spec will work if you have a mixture of booleans and integers. Joystick axis positions would be continuous actions. I think you will need two actor networks, one for the boolean values and one for the continuous integers.

sibyjackgrove · 2022-03-04T03:32:44Z

@cedavidyang Were you able to figure out a fix for this error?

sibyjackgrove · 2022-10-20T20:19:20Z

@summer-yue It seems the only way around this is to comment out the following from lines 11 to 13 in ppo_policy.py

distribution_utils.assert_specs_are_compatible(
actor_output_spec, action_spec,
'actor_network output spec does not match action spec')

aosama · 2024-02-06T17:29:28Z

I encountered the same issue and had to comment out the lines above

cedavidyang added a commit to cedavidyang/agents that referenced this issue Sep 16, 2021

disable actor spec check; see issue tensorflow#656

cbd007d

dorli mentioned this issue Sep 25, 2021

Error when using PPOActorNetwork with a discrete action space environment #664

Closed

summer-yue mentioned this issue Oct 7, 2021

ValueError: actor_network output spec does not match action spec #548

Closed

sibyjackgrove mentioned this issue Mar 3, 2022

actor_network output spec does not match action spec:error when using multi-discrete action space with PPO agent #720

Open

This was referenced Jul 28, 2023

Is there any agent implemented that can work with MultiBinary action space? #702

Closed

Multiple actions for PPOAgent #759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO policy with ActorDistributionNetwork and discrete action array #656

PPO policy with ActorDistributionNetwork and discrete action array #656

cedavidyang commented Sep 7, 2021

summer-yue commented Oct 7, 2021

cedavidyang commented Oct 8, 2021

summer-yue commented Oct 8, 2021

lonaeyeo commented Nov 14, 2021

sibyjackgrove commented Jan 20, 2022

TheGreatRambler commented Jan 21, 2022

sibyjackgrove commented Jan 21, 2022

TheGreatRambler commented Jan 23, 2022

TheGreatRambler commented Jan 23, 2022 •

edited

Loading

sibyjackgrove commented Jan 24, 2022

sibyjackgrove commented Mar 4, 2022

sibyjackgrove commented Oct 20, 2022

aosama commented Feb 6, 2024

PPO policy with ActorDistributionNetwork and discrete action array #656

PPO policy with ActorDistributionNetwork and discrete action array #656

Comments

cedavidyang commented Sep 7, 2021

summer-yue commented Oct 7, 2021

cedavidyang commented Oct 8, 2021

summer-yue commented Oct 8, 2021

lonaeyeo commented Nov 14, 2021

sibyjackgrove commented Jan 20, 2022

TheGreatRambler commented Jan 21, 2022

sibyjackgrove commented Jan 21, 2022

TheGreatRambler commented Jan 23, 2022

TheGreatRambler commented Jan 23, 2022 • edited Loading

sibyjackgrove commented Jan 24, 2022

sibyjackgrove commented Mar 4, 2022

sibyjackgrove commented Oct 20, 2022

aosama commented Feb 6, 2024

TheGreatRambler commented Jan 23, 2022 •

edited

Loading