You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems it is not possible to define a variable decay epsilon during training for dqn_agent.DqnAgent. I have some direct and indirect evidence of this:
I tried to define a custom decay() function and to link the epsilon_greedy to this function at the moment of the agent instantiation; printing the epsilon_greedy value from time to time during the training I can see that it remains at the starting value 1 even if the decay() as if the epsilon_greedy parameter was not callable
therefore, I forced to change the epsilong_greedy value externally during the training process with agent._epsilon_greedy = decay(step); printing the agent._epsilon_greedy I can see that the parameter is really changing but the behavior is again as if the policy was completely random keeping the initial value 1
as counter verification, I used the development of point 1 starting the decay function from 0.011 and ending with a constant epsilon equal to 0.01. In this case, I have the same behavior of training that I get when I set a constant epsilon = 0.01 i.e. it is keeping the initial 0.011
The conclusion I get is that the randomosity of the policy cannot be changed/updated during the training by changing the hyperparameter _epsilon_greedy nor it can be changed by linking the hyperparameter to a custom decay function. Is it true? In case, how can be defined a variable decay epsilon?
The text was updated successfully, but these errors were encountered:
fede72bari
changed the title
epsilon_greedy not modifiable during training
dqn_agent.DqnAgent: epsilon_greedy not modifiable during training
Mar 13, 2023
It seems it is not possible to define a variable decay epsilon during training for dqn_agent.DqnAgent. I have some direct and indirect evidence of this:
The conclusion I get is that the randomosity of the policy cannot be changed/updated during the training by changing the hyperparameter _epsilon_greedy nor it can be changed by linking the hyperparameter to a custom decay function. Is it true? In case, how can be defined a variable decay epsilon?
The text was updated successfully, but these errors were encountered: