Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set save_steps? I set a 15,000, and mine has more than 34,000 steps. I just trained 1.28epoch at first sight, and directly generated 57GB checkpoints for me to dry up the hard disk. #6584

Closed
1 task done
yuanllong opened this issue Jan 9, 2025 · 3 comments
Labels
solved This problem has been already solved

Comments

@yuanllong
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

output

output_dir: /root/autodl-tmp
logging_steps: 10
save_steps: 15000
plot_loss: true
overwrite_output_dir: true

How to set save_steps? I set a 15,000, and mine has more than 34,000 steps. I just trained 1.28epoch at first sight, and directly generated 57GB checkpoints for me to dry up the hard disk.

Reproduction

output

output_dir: /root/autodl-tmp
logging_steps: 10
save_steps: 15000
plot_loss: true
overwrite_output_dir: true

How to set save_steps? I set a 15,000, and mine has more than 34,000 steps. I just trained 1.28epoch at first sight, and directly generated 57GB checkpoints for me to dry up the hard disk.

Others

output

output_dir: /root/autodl-tmp
logging_steps: 10
save_steps: 15000
plot_loss: true
overwrite_output_dir: true

How to set save_steps? I set a 15,000, and mine has more than 34,000 steps. I just trained 1.28epoch at first sight, and directly generated 57GB checkpoints for me to dry up the hard disk.

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 9, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 9, 2025

You can increase save_steps to 999999 or set save_strategy: no

@hiyouga hiyouga closed this as completed Jan 9, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 9, 2025
@yuanllong
Copy link
Author

您可以将 save_steps 增加到 999999 或设置 save_strategy:否

Oh, ok, thank you for your reply, so what is the behavior of saving in the case of save_steps=999999 or no? Only the best model will be saved? Or just save the last model at the end of the training?

@hiyouga
Copy link
Owner

hiyouga commented Jan 9, 2025

save at the end of training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants