Error using parameter train_step_counter according to colab example #121

ideenfix · 2019-05-25T06:56:35Z

I'm using TF Agent (nightly, 0.2.0dev2019430 on Win10 and TF2.0 (GPU, 2.0.0a0).

If you run the snippet according to colab example

`
train_step_counter = tf.Variable(0)

tf_agent = dqn_agent.DqnAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    q_network=net,
    optimizer=optimizer,
    epsilon_greedy=params["epsilon_final"],
    gamma=params['gamma'],
    td_errors_loss_fn=dqn_agent.element_wise_squared_loss,
    train_step_counter=train_step_counter
)

`

After calling DqnAgent.train following error is thrown

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1741, in
main()
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 165, in
train_loss = tf_agent.train(experience)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\tf_agent.py", line 177, in train
loss_info = self._train_fn(experience=experience, weights=weights)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 256, in _train
weights=weights)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 353, in loss
name='loss', data=loss, step=self.train_step_counter)
File "D:\pyenv\tf2\lib\site-packages\tensorboard\plugins\scalar\summary_v2.py", line 65, in scalar
metadata=summary_metadata)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 632, in write
_should_record_summaries_v2(), record, _nothing, name="summary_cond")
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\smart_cond.py", line 54, in smart_cond
return true_fn()
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 627, in record
name=scope)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 793, in write_summary
writer, step, tensor, tag, summary_metadata, name=name, ctx=_ctx)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 824, in write_summary_eager_fallback
step = _ops.convert_to_tensor(step, _dtypes.int64)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1050, in convert_to_tensor
return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1108, in convert_to_tensor_v2
as_ref=False)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1186, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1420, in _dense_var_to_tensor
return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1371, in _dense_var_to_tensor
"of type {!r}".format(dtype.name, self.dtype.name))
ValueError: Incompatible type conversion requested to type 'int64' for variable of type 'int32'

if you would change the initialization of this parameter to

train_step_counter = tf.Variable(0, dtype=tf.int64)

then you have no Problems

01_dqn_basic_tf_selfrunning.txt

The text was updated successfully, but these errors were encountered:

egonina · 2019-05-25T20:56:43Z

global step needs to be of type tf.int64. Could you point to the colab example you're referring to so we can fix it if that's an issue there?

ideenfix · 2019-05-26T03:56:06Z

https://github.com/tensorflow/agents/blob/master/tf_agents/colabs/1_dqn_tutorial.ipynb

egonina · 2019-05-28T15:47:35Z

Hm, I'm unable to reproduce this by running the colab you linked, the train step runs fine and the type of train_step_counter variable is tf.int32. Are you running this colab directly or are you modifying anything in your code?

ideenfix · 2019-06-03T20:36:44Z

At first, i use the colab example as a reference for an own RL agent.

I'm checking the colab example on my notebook after installing the jupyter package. The colab example is running after switching off some import statements as these would not run with Win10 (e.g. pyvirtualdisplay or display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()).

Then I have checked my own script and got some new errors like

D:\pyenv\py36tf2\Scripts\python.exe D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py
Python:3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)]
Tensorflow: 1.14.1-dev20190603
TF-Agent:0.2.0
Traceback (most recent call last):
File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 62, in
writer = tf.summary.create_file_writer(log_dir)
File "D:\pyenv\py36tf2\lib\site-packages\tensorflow\python\util\deprecation_wrapper.py", line 104, in getattr
attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.summary' has no attribute 'create_file_writer'

After checking my virtual environment I stated that after running the colab example tf-nightly was installed. Attached the pip list output excerpt
tb-nightly 1.14.0a20190602
tensorflow-estimator-2.0-preview 1.14.0.dev2019060300
termcolor 1.1.0
terminado 0.8.2
testpath 0.4.2
tf-agents-nightly 0.2.0.dev20190528
tf-estimator-nightly 1.14.0.dev2019052901
tf-nightly 1.14.1.dev20190603
tf-nightly-gpu-2.0-preview 2.0.0.dev20190602
tfp-nightly 0.8.0.dev20190603

This has overwritten the tf-nightly-gpu-2.0-preview package preference and was the reason for the last error regarding to the attribute error as my script based on TF2.0.

Attached you can find the standalone version of my own script
01_dqn_basic_tf_standalone.txt

If you would run this script in a pure TF2 environment e.g.
Python:3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]
Tensorflow: 2.0.0-alpha0

than this script runs until the first training step and
train_loss = tf_agent.train(experience)
throws above mentioned error which can only be corrected after changing the initialization to
train_step_counter = tf.Variable(0, dtype=tf.int64)

PS: Sorry last week I was on vacation leave in the Mediterranean sea.

bartmaciszewski · 2020-02-12T05:06:56Z

I came across the same error when trying to write summaries to Tensorboard.
The fix proposed by @ideenfix to change the step counter fixed the issue.

train_step_counter = tf.Variable(0, dtype=tf.int64)

Thanks!

RachithP mentioned this issue Jun 23, 2021

[PPO] Trajectory action out of range #564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using parameter train_step_counter according to colab example #121

Error using parameter train_step_counter according to colab example #121

ideenfix commented May 25, 2019

egonina commented May 25, 2019

ideenfix commented May 26, 2019

egonina commented May 28, 2019

ideenfix commented Jun 3, 2019

bartmaciszewski commented Feb 12, 2020

Error using parameter train_step_counter according to colab example #121

Error using parameter train_step_counter according to colab example #121

Comments

ideenfix commented May 25, 2019

egonina commented May 25, 2019

ideenfix commented May 26, 2019

egonina commented May 28, 2019

ideenfix commented Jun 3, 2019

bartmaciszewski commented Feb 12, 2020