-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO policy with ActorDistributionNetwork and discrete action array #656
Comments
Thank you for reporting. It's a little hard to know exactly what's going on. Could you help print out both |
Thanks for following up. My BoundedTensorSpec(
shape=(10,),
dtype=tf.int32,
name='action',
minimum=array(0, dtype=int32),
maximum=array(3, dtype=int32)) And the <DistributionSpecV2: event_shape=(), dtype=<dtype: 'int32'>,
parameters=<Params: type=<class 'tensorflow_probability.python.distributions.categorical.Categorical'>,
params={'logits': TensorSpec(shape=(10, 4), dtype=tf.float32, name=None)}>> I believe the error is because by default, |
Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience. import numpy as np action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') action_tensor_spec = tensor_spec.from_spec(action_spec) actor_net_builder = ppo_actor_network.PPOActorNetwork() value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal()) agent = ppo_clip_agent.PPOClipAgent( |
Hi, I received the same error when using ppoAgent,. I tried your code, but i still received the error. version 2:
|
@summer-yue @lonaeyeo The above code results in the following error:
To make it work we need to change action_spec to be a scalar. |
Any updates on this? I would prefer to be able to use a discrete action space rather than being stuck to a scalar, and I've heard commenting out the check creates other problems |
@TheGreatRambler Perhaps I should have been more clear. You can use Discrete action space. What I meant was that the action has to be a single integer value and not a vector of integer values. |
Oh sorry, I'm a bit new to terminology. I need to be able to use a vector of integer values. I commented out the check and the agent appears to be running, but is it actually learning? |
The spec in question. The first 2 integers are joystick axis, the last 3 are booleans where 0 to 32767 is false and above is true.
|
@TheGreatRambler I don't think the above spec will work if you have a mixture of booleans and integers. Joystick axis positions would be continuous actions. I think you will need two actor networks, one for the boolean values and one for the continuous integers. |
@cedavidyang Were you able to figure out a fix for this error? |
@summer-yue It seems the only way around this is to comment out the following from lines 11 to 13 in ppo_policy.py
|
I encountered the same issue and had to comment out the lines above |
I'm using
PPOAgent
andActorDistributionNetwork
with the following action_spec:However, I received the following error when the agent was trying to initialize a
PPOPolicy
The issue arises when executing the following check in
ppo_policy.py
file (near line 112):The
actor_output_spec
ofActorDistributionNetwork
is anevent_spec
with shape()
, which is inconsistent withaction_spec
. A similar issue has been report in #548.My workaround has been to comment out the specs compatibility block in the
ppo_policy.py
file. After doing that, the code can be run successfully, and the agent is able to learn. But I'm not sure if this is a bug, or am I missing anything?The text was updated successfully, but these errors were encountered: