Missing outer_dim in categorical_q_policy action #207

tbund · 2019-09-09T15:36:46Z

Categorical_q_policy's action tensor is missing a dimension compared to policies with same tensor specs.

For the setup

import tensorflow as tf
from tf_agents.trajectories import time_step
from tf_agents.specs.tensor_spec import TensorSpec, BoundedTensorSpec
from tf_agents.networks import categorical_q_network
from tf_agents.policies import categorical_q_policy

obs_spec = BoundedTensorSpec(shape=(7,), dtype=tf.float32, name='observation', minimum=tf.zeros(7), maximum=tf.ones(7))
act_spec = BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=tf.constant(0), maximum=tf.constant(6))
ts_spec = time_step.TimeStep(step_type=TensorSpec(shape=(), dtype=tf.int32, name='step_type'), 
                             reward=TensorSpec(shape=(), dtype=tf.float32, name='reward'), 
                             discount=TensorSpec(shape=(), dtype=tf.float32, name='discount'),
                             observation=obs_spec)

cq_net = categorical_q_network.CategoricalQNetwork(obs_spec, act_spec)
cq_pol = categorical_q_policy.CategoricalQPolicy(0,1,cq_net,act_spec)

ts = time_step.TimeStep(step_type=tf.constant([0]), reward=tf.constant([0]), 
                  discount=tf.constant([1.0]), observation=tf.ones((1,7))*0.5)

cq_pol.action(ts)

the resulting action tensor has shape=(1,)

While for random policy

from tf_agents.policies.random_tf_policy import RandomTFPolicy
r_policy = RandomTFPolicy(ts_spec, act_spec)
r_policy.action(ts)

and pure q-policy

from tf_agents.networks import q_network
from tf_agents.policies import q_policy

q_net = q_network.QNetwork(obs_spec,act_spec)
q_pol = q_policy.QPolicy(ts_spec,act_spec,q_net)
q_pol.action(ts)

the resulting action tensor has shape=(1,1).

This issue results in an error when running categorical-q-agent's collect policy, where epsilon_greedy needs to switch between categorical_q_policy and random policy:

from tf_agents.agents.categorical_dqn import categorical_dqn_agent
tf_agent = categorical_dqn_agent.CategoricalDqnAgent(
    ts_spec,
    act_spec,
    categorical_q_network=cq_net,
    optimizer=tf.compat.v1.train.AdamOptimizer())
tf_agent.initialize()

tf_agent.collect_policy.action(ts)

==> InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape. Input 0: [1] != input 2: [1,1] [Op:Select]

The text was updated successfully, but these errors were encountered:

kbanoop assigned nealwu Sep 9, 2019

tbund changed the title ~~No outer_dim is added to categorical_q_policy action~~ Missing outer_dim in categorical_q_policy action Sep 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing outer_dim in categorical_q_policy action #207

Missing outer_dim in categorical_q_policy action #207

tbund commented Sep 9, 2019 •

edited

Loading

Missing outer_dim in categorical_q_policy action #207

Missing outer_dim in categorical_q_policy action #207

Comments

tbund commented Sep 9, 2019 • edited Loading

tbund commented Sep 9, 2019 •

edited

Loading