Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epsilon greedy policy error when generating new action from Dict Action space. #276

Open
JaCoderX opened this issue Dec 25, 2019 · 3 comments
Assignees
Labels
level:p1 type:bug Something isn't working

Comments

@JaCoderX
Copy link
Contributor

JaCoderX commented Dec 25, 2019

I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.
the original observation space and the action space are both gym.spaces.Dict.
but for the moment I have simplified the observation space so I can fit the env to run using the same code of DQN tutorial example (as a proof of concept). so the modified spaces are as follows:

Observation Spec:
BoundedTensorSpec(shape=(6, 1, 5), dtype=tf.float32, name='observation/external', minimum=array(-100., dtype=float32), maximum=array(100., dtype=float32))
Action Spec:
OrderedDict([('default_asset', BoundedTensorSpec(shape=(), dtype=tf.int64, name='action/default_asset', minimum=array(0), maximum=array(3)))]) 

Error occur under Training the agent section, when performing collect_step():

Traceback (most recent call last):
  File "/home/jack/envTest.py", line 251, in <module>
    collect_step(train_env, agent.collect_policy, replay_buffer)
  File "/home/jack/envTest.py", line 209, in collect_step
    action_step = policy.action(time_step)
  File "/home/jack/tf_agents/policies/tf_policy.py", line 278, in action
    step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed)
  File "/home/jack/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/jack/tf_agents/policies/epsilon_greedy_policy.py", line 106, in _action
    random_action.action)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 3753, in where
    return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9430, in select
    condition, x, y, name=name, ctx=_ctx)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9462, in select_eager_fallback
    _attr_T, _inputs_T = _execute.args_to_matching_eager([x, y], _ctx)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 257, in args_to_matching_eager
    t, dtype, preferred_dtype=default_dtype, ctx=ctx))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1296, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 286, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (DictWrapper({'default_asset': <tf.Tensor: id=132834, shape=(1,), dtype=int64, numpy=array([0])>})) with an unsupported type (<class 'tensorflow.python.training.tracking.data_structures._DictWrapper'>) to a Tensor.

it seems that epsilon greedy policy have some problem with the Dict action space when trying to generate action action_step = policy.action(time_step)
as both DQN and random agents seems to work fine for producing actions, Dict space in epsilon greedy policy seem not to be supported.

any idea on how to resolve this?

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Jan 7, 2020

I'm still not sure on how to add support for Dict action space.

The error occur on trying to get the action from greedy_action (greedy_policy is source)
action = tf.compat.v1.where(cond, greedy_action.action, random_action.action)

@ebrevdo
Copy link
Contributor

ebrevdo commented Jan 22, 2020

That line needs to be rewritten as:

action = tf.nest.map_structure(lambda g, r: tf.compat.v1.where(cond, g, r), greedy_action.action, random_action.action)

Report back and let us know if this works. We can patch it on our end.

@ebrevdo ebrevdo self-assigned this Jan 22, 2020
@JaCoderX
Copy link
Contributor Author

JaCoderX commented Feb 1, 2020

@ebrevdo,
I tested it on my end and it works well.
thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level:p1 type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants