ensure we cast to the right dtype in attention #571

dlwh · 2024-05-01T19:05:56Z

I think fixes #569

@versae: can you give this branch (fix_attn_dtype) a try? I'm traveling and don't have time right now

cc @Helw150 since it tweaks your code.

Helw150

Changes to Whisper code look good to me.

Overall PR also looks good to me, though I haven't been hitting this issue in my training runs so don't have a reproduction case.

dlwh · 2024-05-01T22:39:17Z

It’s a new regression from the TPU splash attention kernel. Still need to benchmark it at a full scale

…

On Wed, May 1, 2024 at 12:44 PM William Held ***@***.***> wrote: ***@***.**** approved this pull request. Changes to Whisper code look good to me. Overall PR also looks good to me, though I haven't been hitting this issue in my training runs so don't have a reproduction case. — Reply to this email directly, view it on GitHub <#571 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACLIOFXDH2GYS2YKT6QPLZAFAZ7AVCNFSM6AAAAABHCNISTKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZUGMZDQMZSGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

versae · 2024-05-02T07:41:16Z

Sure, I'll test it as soon as I can (OOO atm).

Ivan-Zhou · 2024-05-05T19:48:11Z

A few data points with Llama2-7B run on V5lite:

I hit the exact same dtype error using the commit 2516d06 (splash attention, April 26): https://wandb.ai/stanford-mercury/markweb/runs/eo41dcbo40o40505;
I am able to launch a training job successfully using one commit earlier bd2aad6 with batch size 1024 (April 24): https://wandb.ai/stanford-mercury/markweb/runs/eweowbo40o40505;

I tested with this branch, fix_attn_dtype, then the dtype error was avoided, but it kept running into OOM error, even if I reduce the batch size to 256:

OOM at batch size 1024: https://wandb.ai/stanford-mercury/markweb/runs/eo41dcbo40o410240505
OOM at batch size 512: https://wandb.ai/stanford-mercury/markweb/runs/eo41dcbo40o45120505
OOM at batch size 256: https://wandb.ai/stanford-mercury/markweb/runs/eo41dcbo40o42560505

I tried adding back the line to not cast to x.dtype when config.upcast_attn = False (here), but it doesn't help. It still goes OOM: https://wandb.ai/stanford-mercury/markweb/runs/eo41llama40o42560505.

versae · 2024-05-07T12:14:27Z

Training a Mistral 7B (seq len 2048, batch size 2048, parallelism 16), I can confirm fix_attn_type seems to fix the dtype error. I haven't encountered any OOM in my limited experiments yet.

dlwh · 2024-05-07T23:23:21Z

@Ivan-Zhou set model.flash_attention_block_size to null/None and see how it goes?

dlwh · 2024-05-07T23:23:49Z

then try 512

Ivan-Zhou · 2024-05-08T02:38:57Z

@dlwh You are right. With reduced FA block size, I am able to resolve this OOM error.

I am able to train Llama 7B with using 1024 FA block size and up to 2048 batch size and 4k seq length: https://wandb.ai/stanford-mercury/markweb/runs/eo44039204924kd030d20480507

dlwh · 2024-05-08T03:18:49Z

awesome. can you try with it just set to null/unset? this is a new low level kernel built into JAX and the default block size is 512

Ivan-Zhou · 2024-05-09T03:24:57Z

@dlwh I re-launched training jobs with FA block at null (pink) and 512 (blue). They have identical throughput, both less than 1024 (green):

ensure we cast to the right dtype

407d54b

Helw150 approved these changes May 1, 2024

View reviewed changes

dlwh added 2 commits May 7, 2024 23:25

fmjlakf

95cd69e

no more prints

8118ee5

dlwh merged commit 2888a35 into main May 10, 2024
4 checks passed

dlwh deleted the fix_attn_dtype branch May 10, 2024 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure we cast to the right dtype in attention #571

ensure we cast to the right dtype in attention #571

dlwh commented May 1, 2024

Helw150 left a comment

dlwh commented May 1, 2024 via email

versae commented May 2, 2024

Ivan-Zhou commented May 5, 2024 •

edited

Loading

versae commented May 7, 2024

dlwh commented May 7, 2024

dlwh commented May 7, 2024

Ivan-Zhou commented May 8, 2024

dlwh commented May 8, 2024

Ivan-Zhou commented May 9, 2024

ensure we cast to the right dtype in attention #571

ensure we cast to the right dtype in attention #571

Conversation

dlwh commented May 1, 2024

Helw150 left a comment

Choose a reason for hiding this comment

dlwh commented May 1, 2024 via email

versae commented May 2, 2024

Ivan-Zhou commented May 5, 2024 • edited Loading

versae commented May 7, 2024

dlwh commented May 7, 2024

dlwh commented May 7, 2024

Ivan-Zhou commented May 8, 2024

dlwh commented May 8, 2024

Ivan-Zhou commented May 9, 2024

Ivan-Zhou commented May 5, 2024 •

edited

Loading