How to customize Actor policy in Actor-Learner setup #792

JLenssen · 2022-10-31T21:49:38Z

Hi,
I am following https://github.com/tensorflow/agents/tree/master/tf_agents/experimental/distributed/examples/sac to implement an Actor-Learner environment for DDQN (similar to Ape-X, page 6). How can I adapt the actor-policies that get their policy variables from the reverb variable container? I would like to use a different epsilon-greedy policy for each actor.

Actors are differentiated by FLAGS.task in the example. I tried to pass an Epsilon-Greedy Polciy to actor.Actor.policy

  collect_eps_policy = EpsilonGreedyPolicy(collect_policy, epsilon=1/FLAGS.task)
  env_step_metric = py_metrics.EnvironmentSteps()
  collect_actor = actor.Actor(
      collect_env,
      collect_eps_policy,
      train_step,
      steps_per_run=configs["collectors"]["num_steps_per_collect"],
      metrics=actor.collect_metrics(configs["collectors"]["num_steps_per_collect"]),
      summary_dir=summary_dir,
      observers=[rb_observer, env_step_metric])

but that resulted in the following error:

  File "/x/lib/python3.10/site-packages/tf_agents/policies/greedy_policy.py", line 58, in __init__
    emit_log_probability=policy.emit_log_probability,
  File "/x/python3.10/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 246, in __getattr__
    return getattr(self._policy, name)
  AttributeError: '_UserObject' object has no attribute 'emit_log_probability'
  In call to configurable 'GreedyPolicy' (<class 'tf_agents.policies.greedy_policy.GreedyPolicy'>)
  In call to configurable 'EpsilonGreedyPolicy' (<class 'tf_agents.policies.epsilon_greedy_policy.EpsilonGreedyPolicy'>)
  In call to configurable 'collect' (<function collect at 0x2b33a2607490>)

Is there a simple way to implement what I want to do?

The text was updated successfully, but these errors were encountered:

coreyleveen · 2022-11-29T16:07:49Z

The DdqnAgent initializer, as with the DqnAgent, accepts an epsilon_greedy argument. You can see in the code here that it uses an EpsilonGreedyPolicy for its collect policy, provided no Boltzmann temperature is provided:

agent = ddqn_agent.DdqnAgent(
...
epsilon_greedy=0.1)

agent.collect_policy # => EpsilonGreedyPolicy

Does your issue persist after making this change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to customize Actor policy in Actor-Learner setup #792

How to customize Actor policy in Actor-Learner setup #792

JLenssen commented Oct 31, 2022 •

edited

Loading

coreyleveen commented Nov 29, 2022

How to customize Actor policy in Actor-Learner setup #792

How to customize Actor policy in Actor-Learner setup #792

Comments

JLenssen commented Oct 31, 2022 • edited Loading

coreyleveen commented Nov 29, 2022

JLenssen commented Oct 31, 2022 •

edited

Loading