Epsilon greedy policy error when generating new action from Dict Action space. #276

JaCoderX · 2019-12-25T11:37:26Z

I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.
the original observation space and the action space are both gym.spaces.Dict.
but for the moment I have simplified the observation space so I can fit the env to run using the same code of DQN tutorial example (as a proof of concept). so the modified spaces are as follows:

Observation Spec:
BoundedTensorSpec(shape=(6, 1, 5), dtype=tf.float32, name='observation/external', minimum=array(-100., dtype=float32), maximum=array(100., dtype=float32))
Action Spec:
OrderedDict([('default_asset', BoundedTensorSpec(shape=(), dtype=tf.int64, name='action/default_asset', minimum=array(0), maximum=array(3)))])

Error occur under Training the agent section, when performing collect_step():

Traceback (most recent call last):
  File "/home/jack/envTest.py", line 251, in <module>
    collect_step(train_env, agent.collect_policy, replay_buffer)
  File "/home/jack/envTest.py", line 209, in collect_step
    action_step = policy.action(time_step)
  File "/home/jack/tf_agents/policies/tf_policy.py", line 278, in action
    step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed)
  File "/home/jack/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/jack/tf_agents/policies/epsilon_greedy_policy.py", line 106, in _action
    random_action.action)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 3753, in where
    return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9430, in select
    condition, x, y, name=name, ctx=_ctx)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9462, in select_eager_fallback
    _attr_T, _inputs_T = _execute.args_to_matching_eager([x, y], _ctx)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 257, in args_to_matching_eager
    t, dtype, preferred_dtype=default_dtype, ctx=ctx))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1296, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 286, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (DictWrapper({'default_asset': <tf.Tensor: id=132834, shape=(1,), dtype=int64, numpy=array([0])>})) with an unsupported type (<class 'tensorflow.python.training.tracking.data_structures._DictWrapper'>) to a Tensor.

it seems that epsilon greedy policy have some problem with the Dict action space when trying to generate action action_step = policy.action(time_step)
as both DQN and random agents seems to work fine for producing actions, Dict space in epsilon greedy policy seem not to be supported.

any idea on how to resolve this?

The text was updated successfully, but these errors were encountered:

JaCoderX · 2020-01-07T13:10:51Z

I'm still not sure on how to add support for Dict action space.

The error occur on trying to get the action from greedy_action (greedy_policy is source)
action = tf.compat.v1.where(cond, greedy_action.action, random_action.action)

ebrevdo · 2020-01-22T16:30:26Z

That line needs to be rewritten as:

action = tf.nest.map_structure(lambda g, r: tf.compat.v1.where(cond, g, r), greedy_action.action, random_action.action)

Report back and let us know if this works. We can patch it on our end.

JaCoderX · 2020-02-01T09:43:54Z

@ebrevdo,
I tested it on my end and it works well.
thank you :)

ebrevdo self-assigned this Jan 22, 2020

JaCoderX mentioned this issue Feb 1, 2020

DQN loss calculation error when using Dict Action space #297

Open

tfboyd added type:bug Something isn't working level:p1 labels Feb 4, 2020

JaCoderX mentioned this issue Feb 6, 2020

added support for Dict action space #301

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epsilon greedy policy error when generating new action from Dict Action space. #276

Epsilon greedy policy error when generating new action from Dict Action space. #276

JaCoderX commented Dec 25, 2019 •

edited

Loading

JaCoderX commented Jan 7, 2020

ebrevdo commented Jan 22, 2020

JaCoderX commented Feb 1, 2020

Epsilon greedy policy error when generating new action from Dict Action space. #276

Epsilon greedy policy error when generating new action from Dict Action space. #276

Comments

JaCoderX commented Dec 25, 2019 • edited Loading

JaCoderX commented Jan 7, 2020

ebrevdo commented Jan 22, 2020

JaCoderX commented Feb 1, 2020

JaCoderX commented Dec 25, 2019 •

edited

Loading