From 8655430cf00f59ed07dd490be5463d8ac55ecad0 Mon Sep 17 00:00:00 2001 From: Jeremiah <4462211+jeremiahpslewis@users.noreply.github.com> Date: Mon, 23 Oct 2023 10:53:00 -0500 Subject: [PATCH] Apply suggestions from code review --- .../src/algorithms/offline_rl/CQL_SAC.jl | 4 ++-- .../src/algorithms/offline_rl/offline_rl.jl | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl b/src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl index 72d284ca2..a77e42fb7 100644 --- a/src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl +++ b/src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl @@ -14,11 +14,11 @@ export CQLSACPolicy ) Implements the Conservative Q-Learning algorithm [1] in its continuous variant on top of the SAC algorithm [2]. `CQLSACPolicy` wraps a classic `SACPolicy` whose networks will be trained normally, except for the additional conservative loss. - CQLSACPolicy contains the additional hyperparameters that are specific to this method. α_cql is the lagrange penalty for the conservative_loss, it will be automatically tuned if ` α_cql_autotune = true`. `cons_weight` is a scaling parameter + `CQLSACPolicy` contains the additional hyperparameters that are specific to this method. α_cql is the lagrange penalty for the conservative_loss, it will be automatically tuned if ` α_cql_autotune = true`. `cons_weight` is a scaling parameter which may be necessary to decrease if the scale of the Q-values is large. `τ_cql` is the threshold of the lagrange conservative penalty. See SACPolicy for all the other hyperparameters related to SAC. - If desired, you can provide an `Experiment(agent, env, stop_condition, hook)` to finetune_experiment to finish the training with a finetuning run. `agent` should be a normal `Agent` with policy being `sac`, an environment to finetune on. + If desired, you can provide an `Experiment(agent, env, stop_condition, hook)` to `finetune_experiment` to finish the training with a finetuning run. `agent` should be a normal `Agent` with policy being `sac`, an environment to finetune on. See the example in ReinforcementLearningExperiments.jl for an example on the Pendulum task. As this is an offline algorithm, it must be wrapped in an `OfflineAgent` which will not update the trajectory as the training progresses. However, it _will_ interact with the supplied environment, which may be useful to record the progress. diff --git a/src/ReinforcementLearningZoo/src/algorithms/offline_rl/offline_rl.jl b/src/ReinforcementLearningZoo/src/algorithms/offline_rl/offline_rl.jl index 7ccc7d823..d3ba98a59 100644 --- a/src/ReinforcementLearningZoo/src/algorithms/offline_rl/offline_rl.jl +++ b/src/ReinforcementLearningZoo/src/algorithms/offline_rl/offline_rl.jl @@ -8,4 +8,4 @@ include("PLAS.jl") include("ope/ope.jl") include("common.jl") =# -include("CQL_SAC.jl") \ No newline at end of file +include("CQL_SAC.jl")