Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: actor_network output spec does not match action spec #548

Closed
Fabien-Couthouis opened this issue Feb 2, 2021 · 4 comments
Closed

Comments

@Fabien-Couthouis
Copy link

Fabien-Couthouis commented Feb 2, 2021

Hello,
I am trying to train a PPO agent with the default actor_distribution_network but I get this error:

Traceback (most recent call last):
  File "run_shake_training_ppo.py", line 285, in <module>
    run_training()
  File "run_shake_training_ppo.py", line 83, in run_training
    agent = load_agent(train_env)
  File "run_shake_training_ppo.py", line 124, in load_agent
    agent = PPOClipAgent(
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tf_agents\agents\ppo\ppo_clip_agent.py", line 199, in __init__
    super(PPOClipAgent, self).__init__(
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tf_agents\agents\ppo\ppo_agent.py", line 346, in __init__
    ppo_policy.PPOPolicy(
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\gin\config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tf_agents\agents\ppo\ppo_policy.py", line 116, in __init__
    distribution_utils.assert_specs_are_compatible(
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tf_agents\distributions\utils.py", line 633, in assert_specs_are_compatible
    tf.nest.map_structure(compare_output_to_spec, event_spec, spec)
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tensorflow\python\util\nest.py", line 635, in map_structure
    structure[0], [func(*x) for x in entries],
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tensorflow\python\util\nest.py", line 635, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "C:\Users\username\Miniconda3\envs\tf-agents\lib\site-packages\tf_agents\distributions\utils.py", line 630, in compare_output_to_spec
    raise ValueError("{}:\n{}\nvs.\n{}".format(message_prefix, event_spec,
ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.int64, name=None)
vs.
BoundedTensorSpec(shape=(1,), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(126, dtype=int64))

In my Python env, I have: self._action_spec = BoundedArraySpec((1,), dtype=np.int64, name="action", minimum=0, maximum=126). Then I wrap my Python env in a TFPyEnvironment.

Note that training is working when I comment the following lines in tf_agents.agents.ppo.ppo_policy.py:

distribution_utils.assert_specs_are_compatible(
            actor_output_spec, action_spec,
            'actor_network output spec does not match action spec')

Does anyone have an idea to fix the error above?
Thanks!

Versions:

  • OS: Windows 10 version 2004 (build 19041.746)
  • Python 3.8.5
  • tf_agents: 0.7.1
  • tensorflow: 2.4.1
  • tensorflow-probability: 0.11.1
  • numpy: 1.20.0
@egonina
Copy link
Contributor

egonina commented Feb 4, 2021

Hi, sorry you are having issues running ppo training. When you say "training is working" do you mean it trains to the expected value? Can you dig into the network/action spec mismatch a bit more to see why the output layer spec doesn't match the action_spec? Looks like the shape is mismatched (() vs (1,))

@Fabien-Couthouis
Copy link
Author

Fabien-Couthouis commented Feb 5, 2021

Hi and thanks for helping!

First, my action specs defined in my env are: self._action_spec = BoundedArraySpec((1,), dtype=np.int64, name="action", minimum=0, maximum=126).

When I say "training is working", I mean training starts without error, with logits outputs from the actor model having the good shape (i.e.: <tf.Tensor: shape=(1, 1, 127), dtype=float32)>).
However, event_shape of the same output_action is: TensorShape([]) and it seems like this event_shape is compared to my actions shape in distributions.utils.assert_specs_are_compatible and produces the error:

nest_utils.assert_same_structure(
        event_spec,
        spec,
        message=("{}:\n{}\nvs.\n{}".format(message_prefix, event_spec, spec)))

event_spec: dtype:tf.int64 name:None shape:TensorShape([])

spec: dtype:tf.int64 maximum:array(126, dtype=int64) minimum:array(0, dtype=int64) name:'action' shape:TensorShape([1])

What is this event_spec and why are we comparing actions shape and model output shape, as the models outputs are logits that are changed in actions in the PPO policy? Is the event_shape the shape of actions took by the policy? If so, where is this event_spec computed?

I am pretty new to tf-agents so I am probably wrong somewhere.

Edit: It was my bad, I solved my issue by changing my action shape, from self._action_spec = BoundedArraySpec(shape=(1,), dtype=np.int64, name="action", minimum=0, maximum=126). to self._action_spec = BoundedArraySpec(shape=(), dtype=np.int64, name="action", minimum=0, maximum=126), as my action shape is only one integer.

@cedavidyang
Copy link

I'm having the same issue. In my case, my action_spec is not an integer. Instead, each action is represented by an array with two entries: action_spec = BoundedArraySpec(shape=(2,), dtype=np.int32, name="action", minimum=0, maximum=4). Any suggestion for this use case?

@summer-yue
Copy link
Member

@cedavidyang I responded in #656 let's follow up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants