You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.
the original observation space and the action space are both gym.spaces.Dict.
but for the moment I have simplified the observation space so I can fit the env to run using the same code of DQN tutorial example (as a proof of concept). so the modified spaces are as follows:
Error occur under Training the agent section, when performing collect_step():
Traceback (most recent call last):
File "/home/jack/envTest.py", line 251, in <module>
collect_step(train_env, agent.collect_policy, replay_buffer)
File "/home/jack/envTest.py", line 209, in collect_step
action_step = policy.action(time_step)
File "/home/jack/tf_agents/policies/tf_policy.py", line 278, in action
step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed)
File "/home/jack/tf_agents/utils/common.py", line 131, in with_check_resource_vars
return fn(*fn_args, **fn_kwargs)
File "/home/jack/tf_agents/policies/epsilon_greedy_policy.py", line 106, in _action
random_action.action)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 3753, in where
return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9430, in select
condition, x, y, name=name, ctx=_ctx)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9462, in select_eager_fallback
_attr_T, _inputs_T = _execute.args_to_matching_eager([x, y], _ctx)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 257, in args_to_matching_eager
t, dtype, preferred_dtype=default_dtype, ctx=ctx))
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1296, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 286, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
allow_broadcast=True)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (DictWrapper({'default_asset': <tf.Tensor: id=132834, shape=(1,), dtype=int64, numpy=array([0])>})) with an unsupported type (<class 'tensorflow.python.training.tracking.data_structures._DictWrapper'>) to a Tensor.
it seems that epsilon greedy policy have some problem with the Dict action space when trying to generate action action_step = policy.action(time_step)
as both DQN and random agents seems to work fine for producing actions, Dict space in epsilon greedy policy seem not to be supported.
any idea on how to resolve this?
The text was updated successfully, but these errors were encountered:
I'm still not sure on how to add support for Dict action space.
The error occur on trying to get the action from greedy_action (greedy_policy is source) action = tf.compat.v1.where(cond, greedy_action.action, random_action.action)
I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.
the original observation space and the action space are both
gym.spaces.Dict
.but for the moment I have simplified the observation space so I can fit the env to run using the same code of DQN tutorial example (as a proof of concept). so the modified spaces are as follows:
Error occur under Training the agent section, when performing
collect_step()
:it seems that epsilon greedy policy have some problem with the Dict action space when trying to generate action
action_step = policy.action(time_step)
as both DQN and random agents seems to work fine for producing actions, Dict space in epsilon greedy policy seem not to be supported.
any idea on how to resolve this?
The text was updated successfully, but these errors were encountered: