Training goes for more steps than expected in the first epoch. #805

abhisirka2001 · 2025-01-05T12:21:49Z

abhisirka2001
Jan 5, 2025

Issue: Training goes for more steps than expected in the first epoch.

Details:
Dataset size: 48,014 samples
Batch size: 2
Grad accumulation steps: 1
Hardware: RTX 4090
Expected steps per epoch: 24,007, but it exceeds this. Also, the loss curve starts high (above 8).

Snapshot of yaml file

Any ideas on what could be causing this or how to debug?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training goes for more steps than expected in the first epoch. #805

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Training goes for more steps than expected in the first epoch. #805

abhisirka2001 Jan 5, 2025

Replies: 0 comments

abhisirka2001
Jan 5, 2025