You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
agent.train() is executing in graph mode for the first iteration and then executes eagerly for the subsequent iterations. Why is that? Should I process that batch of tensors, and if so what session do I evaluate it with?
Effectiveness to Policy
illegal_moves function is being used for both the agent definition:
Digging deep into the DQN agent code, I saw that agent.policy is already applying illegal_moves I'm passing to the agent: here.
After training the agent and saving the policy, I ran a few tests and found that the policy is still picking up illegal moves. So this arises the question, is illegal_moves being applied to agent.policy?
Bonus question: @tf.function
I have tried decorating illegal_moves with @tf.function which for some reason made it ineffective during training; i.e. illegal moves were still being picked up. Any explanation for this?
Hi @abstractpaper, glad to hear that you're interested in action masking. In order to reproduce this, are you able to provide a more minimal example where the policy still generates illegal actions? The ideal example would be in the form of a unit test, something similar to https://github.com/tensorflow/agents/blob/master/tf_agents/policies/q_policy_test.py#L202.
One thing worth pointing out is that in the mask, 1s represent valid actions and 0s represent invalid actions. Your mask of all zeros in the graph case is actually invalid, since none of the actions are legal, so the policy will end up just choosing one randomly.
In order to ensure you get eager execution, I believe you should enable eager execution as early as possible in your program (for example at the beginning of your main function). It should be enabled by default in TF 2.0 though.
I have been trying to use
observation_and_action_constraint_splitter
to mask illegal actions. This is my function:There are two main issues I can't wrap my head around:
Eager Execution vs Graph Execution
You can see in my function I'm differentiating between the two. The reason is I'm getting the following output when I train my agent:
agent.train()
is executing in graph mode for the first iteration and then executes eagerly for the subsequent iterations. Why is that? Should I process that batch of tensors, and if so what session do I evaluate it with?Effectiveness to Policy
illegal_moves
function is being used for both the agent definition:and for policies:
Digging deep into the DQN agent code, I saw that
agent.policy
is already applyingillegal_moves
I'm passing to the agent: here.After training the agent and saving the policy, I ran a few tests and found that the policy is still picking up illegal moves. So this arises the question, is
illegal_moves
being applied toagent.policy
?Bonus question: @tf.function
I have tried decorating
illegal_moves
with@tf.function
which for some reason made it ineffective during training; i.e. illegal moves were still being picked up. Any explanation for this?train.py
is available in this gist. Thanks!The text was updated successfully, but these errors were encountered: