flash attention #1

Srini-98 · 2023-08-26T01:46:06Z

The llama_patch.py helps to use flash attention when finetuning. This script has to be invoked in the training script to convert the attention mechanism. Function 'upcast_layer_for_flash_attention' is needed to convert the weights if the finetuning is done using Qlora. This helps cut finetuning time by 1.5x

flash attention

bae8c5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash attention #1

flash attention #1

Srini-98 commented Aug 26, 2023 •

edited

Loading

flash attention #1

Are you sure you want to change the base?

flash attention #1

Conversation

Srini-98 commented Aug 26, 2023 • edited Loading

Srini-98 commented Aug 26, 2023 •

edited

Loading