Skip to content

Commit

Permalink
Update src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremiahpslewis authored Oct 23, 2023
1 parent 4c88a28 commit e47ccb4
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export CQLSACPolicy
finetune_experiment::E = nothing #Provide an second experiment to run at PostExperimentStage to finetune the sac policy, typically with an agent that uses the sac policy. Leave nothing if no finetuning is desired.
)
Implements the Conservative Q-Learning algorithm [1] in its continuous variant on top of the SAC algorithm [2]. CQLSACPolicy wraps a classic SACPolicy whose networks will be trained normally, except for the additional conservative loss.
Implements the Conservative Q-Learning algorithm [1] in its continuous variant on top of the SAC algorithm [2]. `CQLSACPolicy` wraps a classic `SACPolicy` whose networks will be trained normally, except for the additional conservative loss.
CQLSACPolicy contains the additional hyperparameters that are specific to this method. α_cql is the lagrange penalty for the conservative_loss, it will be automatically tuned if ` α_cql_autotune = true`. `cons_weight` is a scaling parameter
which may be necessary to decrease if the scale of the Q-values is large. `τ_cql` is the threshold of the lagrange conservative penalty.
See SACPolicy for all the other hyperparameters related to SAC.
Expand Down

0 comments on commit e47ccb4

Please sign in to comment.