[tt-train] Add kahan summation in AdamW #15518

rfurko-tt · 2024-11-27T22:40:05Z

Problem description

bfloat16 has wide range, but is not as precise as float32. Some of the parameters are not updated due to small magnitude of the gradients (after multiplication by learning rate), as an example: gamma in LayerNorm.

What's changed

Add kahan summation flag to enable kahan summation when update weights.

Checklist

Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12128966379
New/Existing tests provide coverage for changes

tt-train/sources/ttml/optimizers/adamw.cpp

Add kahan summation in adamw

d27b288

rfurko-tt requested a review from dmakoviichuk-tt as a code owner November 27, 2024 22:40

dmakoviichuk-tt reviewed Nov 27, 2024

View reviewed changes

tt-train/sources/ttml/optimizers/adamw.cpp Outdated Show resolved Hide resolved

rfurko-tt added 2 commits December 2, 2024 18:10

Merge remote-tracking branch 'origin/main' into rfurko/kahan_adamw

bf4b21a

Change exception message to use default AdamW in case of kahan summation

8920bee

dmakoviichuk-tt approved these changes Dec 2, 2024

View reviewed changes

rfurko-tt merged commit 070a226 into main Dec 2, 2024
9 checks passed

rfurko-tt deleted the rfurko/kahan_adamw branch December 2, 2024 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tt-train] Add kahan summation in AdamW #15518

[tt-train] Add kahan summation in AdamW #15518

rfurko-tt commented Nov 27, 2024 •

edited

Loading

[tt-train] Add kahan summation in AdamW #15518

[tt-train] Add kahan summation in AdamW #15518

Conversation

rfurko-tt commented Nov 27, 2024 • edited Loading

Problem description

What's changed

Checklist

rfurko-tt commented Nov 27, 2024 •

edited

Loading