Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using parameter train_step_counter according to colab example #121

Open
ideenfix opened this issue May 25, 2019 · 5 comments
Open

Comments

@ideenfix
Copy link

I'm using TF Agent (nightly, 0.2.0dev2019430 on Win10 and TF2.0 (GPU, 2.0.0a0).

If you run the snippet according to colab example

`
train_step_counter = tf.Variable(0)

tf_agent = dqn_agent.DqnAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    q_network=net,
    optimizer=optimizer,
    epsilon_greedy=params["epsilon_final"],
    gamma=params['gamma'],
    td_errors_loss_fn=dqn_agent.element_wise_squared_loss,
    train_step_counter=train_step_counter
)

`

After calling DqnAgent.train following error is thrown

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1741, in
main()
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2019.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 165, in
train_loss = tf_agent.train(experience)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\tf_agent.py", line 177, in train
loss_info = self._train_fn(experience=experience, weights=weights)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 256, in _train
weights=weights)
File "D:\pyenv\tf2\lib\site-packages\tf_agents\agents\dqn\dqn_agent.py", line 353, in loss
name='loss', data=loss, step=self.train_step_counter)
File "D:\pyenv\tf2\lib\site-packages\tensorboard\plugins\scalar\summary_v2.py", line 65, in scalar
metadata=summary_metadata)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 632, in write
_should_record_summaries_v2(), record, _nothing, name="summary_cond")
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\smart_cond.py", line 54, in smart_cond
return true_fn()
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 627, in record
name=scope)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 793, in write_summary
writer, step, tensor, tag, summary_metadata, name=name, ctx=_ctx)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 824, in write_summary_eager_fallback
step = _ops.convert_to_tensor(step, _dtypes.int64)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1050, in convert_to_tensor
return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1108, in convert_to_tensor_v2
as_ref=False)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 1186, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1420, in _dense_var_to_tensor
return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access
File "D:\pyenv\tf2\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1371, in _dense_var_to_tensor
"of type {!r}".format(dtype.name, self.dtype.name))
ValueError: Incompatible type conversion requested to type 'int64' for variable of type 'int32'

if you would change the initialization of this parameter to

train_step_counter = tf.Variable(0, dtype=tf.int64)

then you have no Problems

01_dqn_basic_tf_selfrunning.txt

@egonina
Copy link
Contributor

egonina commented May 25, 2019

global step needs to be of type tf.int64. Could you point to the colab example you're referring to so we can fix it if that's an issue there?

@ideenfix
Copy link
Author

@egonina
Copy link
Contributor

egonina commented May 28, 2019

Hm, I'm unable to reproduce this by running the colab you linked, the train step runs fine and the type of train_step_counter variable is tf.int32. Are you running this colab directly or are you modifying anything in your code?

@ideenfix
Copy link
Author

ideenfix commented Jun 3, 2019

At first, i use the colab example as a reference for an own RL agent.

I'm checking the colab example on my notebook after installing the jupyter package. The colab example is running after switching off some import statements as these would not run with Win10 (e.g. pyvirtualdisplay or display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()).

Then I have checked my own script and got some new errors like

D:\pyenv\py36tf2\Scripts\python.exe D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py
Python:3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)]
Tensorflow: 1.14.1-dev20190603
TF-Agent:0.2.0
Traceback (most recent call last):
File "D:/git/Deep-Reinforcement-Learning-Hands-On/Chapter07/01_dqn_basic_tf.py", line 62, in
writer = tf.summary.create_file_writer(log_dir)
File "D:\pyenv\py36tf2\lib\site-packages\tensorflow\python\util\deprecation_wrapper.py", line 104, in getattr
attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.summary' has no attribute 'create_file_writer'

After checking my virtual environment I stated that after running the colab example tf-nightly was installed. Attached the pip list output excerpt
tb-nightly 1.14.0a20190602
tensorflow-estimator-2.0-preview 1.14.0.dev2019060300
termcolor 1.1.0
terminado 0.8.2
testpath 0.4.2
tf-agents-nightly 0.2.0.dev20190528
tf-estimator-nightly 1.14.0.dev2019052901
tf-nightly 1.14.1.dev20190603
tf-nightly-gpu-2.0-preview 2.0.0.dev20190602
tfp-nightly 0.8.0.dev20190603

This has overwritten the tf-nightly-gpu-2.0-preview package preference and was the reason for the last error regarding to the attribute error as my script based on TF2.0.

Attached you can find the standalone version of my own script
01_dqn_basic_tf_standalone.txt

If you would run this script in a pure TF2 environment e.g.
Python:3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]
Tensorflow: 2.0.0-alpha0

than this script runs until the first training step and
train_loss = tf_agent.train(experience)
throws above mentioned error which can only be corrected after changing the initialization to
train_step_counter = tf.Variable(0, dtype=tf.int64)

PS: Sorry last week I was on vacation leave in the Mediterranean sea.

@bartmaciszewski
Copy link

I came across the same error when trying to write summaries to Tensorboard.
The fix proposed by @ideenfix to change the step counter fixed the issue.

train_step_counter = tf.Variable(0, dtype=tf.int64)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants