Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping all checkpoints #177

Closed
fratajcz opened this issue Jan 22, 2021 · 2 comments
Closed

Keeping all checkpoints #177

fratajcz opened this issue Jan 22, 2021 · 2 comments

Comments

@fratajcz
Copy link

Hi!

I wondered if it was possible to keep all the checkpoints of a run, not just the best one.

This is interesting since it is currently not possible to run multiple evaluation jobs (to my knowledge) during training.
That way I could run the training with the evaluation job i want to do early stopping on and later evaluate potential other metrics over the course of the whole training process from the saved checkpoints.

Let me know if there is a flag or something that has this effect.

Thanks again!

@samuelbroscheit
Copy link
Member

Correct, currently we don't support multiple evaluations during training (see #102 ).

But as you said, you can keep all the checkpoints with the following option:

  checkpoint:
    # In addition the the checkpoint of the last epoch (which is transient),
    # create an additional checkpoint every this many epochs. Disable additional
    # checkpoints with 0.
    every: 5

    # Keep this many most recent additional checkpoints.
    keep: 3

from

checkpoint:

@fratajcz
Copy link
Author

Thanks, I'll give it a go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants