Training goes for more steps than expected in the first epoch. #805
abhisirka2001
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue: Training goes for more steps than expected in the first epoch.
Details:
Dataset size: 48,014 samples
Batch size: 2
Grad accumulation steps: 1
Hardware: RTX 4090
Expected steps per epoch: 24,007, but it exceeds this. Also, the loss curve starts high (above 8).
Snapshot of yaml file
Any ideas on what could be causing this or how to debug?
Beta Was this translation helpful? Give feedback.
All reactions