Add Retrace and a QNetwork abstraction #615

HenriDeh · 2022-04-05T09:48:55Z

Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as a plug-in that can optionally be used by an algorithm (I have #604 in mind mainly) but such a level of abstraction was not implemented yet. So here's my attempt. I assume that this may not be what you have in mind @findmyway but it can be a useful discussion even if this is never merged.

The main idea is that Retrace is just a different way than TD(n) to compute update targets for a QNetwork. So I created a QNetwork abstraction that can be called to be updated given a batch of action, states and targets : update!(qnetwork::AbstractQNetwork, states, actions, targets).
The main goal here was the introduction of the function q_targets, called by update!.

Now, why did I do this? Because then I could implement retrace in way that is reusable by ab algorithm that uses the QNetwork abstraction. RetraceTrajectory is an extension of a Trajectory, using a new type allows to overload q_targets. So an algorithm that uses a RetraceTrajectory will automatically use this target to update its AbstractQNetwork.

I'll leave this PR as a draft and only use it locally for now. When we move to 0.11 I'll adapt to make Retrace work with it.

harwiltz · 2022-04-05T12:54:33Z

I like where this is going. Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates (not necessarily in this PR, just in general)? Also, can you please add a reference to the Retrace paper in a comment somewhere? I believe I read this paper a while ago, but it would be nice to have a reference for convenience.

HenriDeh · 2022-04-05T13:01:13Z

Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates

Yes, I think it's a choice that would make sense. Now that I think of it, it would make more sense to go this way and even define two QNetworkWithTarget, one that uses polyak averaging and one that copies the weights every k updates, since both approaches are common.

For the ref, you can find the paper here.
Vtrace and Impala would be nice additions in the future (https://arxiv.org/abs/1802.01561) too. These pertain a lot to distributed RL that I think is about to undergo major changes in this package soon.

findmyway · 2022-04-05T14:33:47Z

Thanks! This is very helpful!

HenriDeh added 4 commits April 5, 2022 11:13

add a caller to NStepBatchSampler

91c8e8c

Add a QNetwork abstraction

603459b

add a Retrace trajectory

8600ed3

comments

23bcf66

HenriDeh closed this by deleting the head repository Mar 15, 2023

HenriDeh mentioned this pull request Aug 11, 2023

Missing features in RLCore #961

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Retrace and a QNetwork abstraction #615

Add Retrace and a QNetwork abstraction #615

HenriDeh commented Apr 5, 2022

harwiltz commented Apr 5, 2022

HenriDeh commented Apr 5, 2022 •

edited

Loading

findmyway commented Apr 5, 2022

Add Retrace and a QNetwork abstraction #615

Add Retrace and a QNetwork abstraction #615

Conversation

HenriDeh commented Apr 5, 2022

harwiltz commented Apr 5, 2022

HenriDeh commented Apr 5, 2022 • edited Loading

findmyway commented Apr 5, 2022

HenriDeh commented Apr 5, 2022 •

edited

Loading