You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I write a new environment (navigation on deterministic map):
(1) I run " python train.py --config xxxx", and get config.json, policy.th.
(2) I run "python test.py -config xxxx", and get results.npz.
But the rewards in results.npz are still very low.
What should I do to use policy.th to fast adapt to a new task?
The text was updated successfully, but these errors were encountered:
You should use --policy policy.th in test.py to use your trained policy.
That's surprising that you didn't get any error when running test.py without --policy, since this is a required parameter.
I get it. I run test.py with policy.th. But the rewards of valid_return are equal to or even lower than train_return.
Maybe, our environment is not suitable. Thanks.
I write a new environment (navigation on deterministic map):
(1) I run " python train.py --config xxxx", and get config.json, policy.th.
(2) I run "python test.py -config xxxx", and get results.npz.
But the rewards in results.npz are still very low.
What should I do to use policy.th to fast adapt to a new task?
The text was updated successfully, but these errors were encountered: