Add WarmupCosineAnnealingScheduler to NeuralLAM and add --steps
keyword to training script
#7
+120
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements 2 features.
A
--steps
argument has been added to the training script, allowing control over the number of training steps before termination. This is implemented usingTrainer(..., max_num_steps=args.steps, ...)
in pytorch_lightning. If bothmax_num_epochs
andmax_num_steps
are specified in theTrainer
, training will stop when either has been reached. By default"--steps"
is -1 which means that without specifying this keyword, training will never terminate due tomax_num_steps
.A learning scheduler to NeuralLAM following the specifications given for learning rates in GraphCast at the developer meeting at DMI 4th of March. The defaults are implemented such that when using the default learning rate for Adam (
1e-3
) the linear warmup will range from epsilon to1e-3
and then cosine annealing from 1e-3 to 1e-6. Otherwise, min and max learning rate will range frommin_factor * initial_lr
tomax_factor * initial_lr)
The signature and defaults of the scheduler is
The scheduler has 3 phases:
Warmup phase
In this phase the learning rate is warmed up by linearly increasing the learning rate from
min_factor * initial_learning_rate
tomax_factor * initial_learning_rate
(The learning rate is defined in the optimizer).Annealing phase
In this phase the learning rate is annealed by using a CosineAnnealing schedule that anneals the learning rate back to
min_factor * initial_learning_rate
by following a half a cosing cycle.Fine tuning phase
In this phase the annealed learning rate is used until the training terminates.
Learning schedule
Using the default learning rate for the adam optimizer (1e-3), 20 warmup steps, 100 annealing steps and 150 steps in total will produce a schedule that looks as folllows:
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee