Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremiahpslewis authored Oct 23, 2023
1 parent e47ccb4 commit 8655430
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ export CQLSACPolicy
)
Implements the Conservative Q-Learning algorithm [1] in its continuous variant on top of the SAC algorithm [2]. `CQLSACPolicy` wraps a classic `SACPolicy` whose networks will be trained normally, except for the additional conservative loss.
CQLSACPolicy contains the additional hyperparameters that are specific to this method. α_cql is the lagrange penalty for the conservative_loss, it will be automatically tuned if ` α_cql_autotune = true`. `cons_weight` is a scaling parameter
`CQLSACPolicy` contains the additional hyperparameters that are specific to this method. α_cql is the lagrange penalty for the conservative_loss, it will be automatically tuned if ` α_cql_autotune = true`. `cons_weight` is a scaling parameter
which may be necessary to decrease if the scale of the Q-values is large. `τ_cql` is the threshold of the lagrange conservative penalty.
See SACPolicy for all the other hyperparameters related to SAC.
If desired, you can provide an `Experiment(agent, env, stop_condition, hook)` to finetune_experiment to finish the training with a finetuning run. `agent` should be a normal `Agent` with policy being `sac`, an environment to finetune on.
If desired, you can provide an `Experiment(agent, env, stop_condition, hook)` to `finetune_experiment` to finish the training with a finetuning run. `agent` should be a normal `Agent` with policy being `sac`, an environment to finetune on.
See the example in ReinforcementLearningExperiments.jl for an example on the Pendulum task.
As this is an offline algorithm, it must be wrapped in an `OfflineAgent` which will not update the trajectory as the training progresses. However, it _will_ interact with the supplied environment, which may be useful to record the progress.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ include("PLAS.jl")
include("ope/ope.jl")
include("common.jl")
=#
include("CQL_SAC.jl")
include("CQL_SAC.jl")

0 comments on commit 8655430

Please sign in to comment.