-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FA2 broken for Cohere2 if Optional Mask
is not passed in forward
#35547
Comments
cohere2 flash attention 2 code is the original one from the author as you can see here. cohere2 model is one of the few models that code its own Also, we are refactoring the attention #35235, please let us know if you face any issues with other models ! |
Not entirely sure why you chose this particular commit as a version, but this does not seem to be an issue on |
Mask
is not passed in forward
Mask
is not passed in forward
@Cyrilvallez My mistake. Our CI was force checking out a commit post 4.47.1 but not the latest main since Cohere2 code was merged right after 4.47.1 release. So it looks like Cohere2 is the only model that still has the broken implementation code for Fa2. @alexrs-cohere Please check.
|
Ha indeed the issue persists for Cohere2! Thanks, I'll open a PR! |
Thanks for reporting this @Qubitium! @Cyrilvallez let me know when the PR is ready and if you need any support from me! |
System Info
transformers==4.48.0.dev0 (from git+https://github.com/huggingface/transformers.git@5615a393691c81e00251e420c73e4d04c6fe22e5)
Who can help?
@ArthurZucker @Cyrilvallez @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Check our CI test failures:
Gemma
https://github.com/ModelCloud/GPTQModel/actions/runs/12651906072/job/35253942521#step:12:1164
Cohere2
https://github.com/ModelCloud/GPTQModel/actions/runs/12651906072/job/35253938235#step:12:922
We enabled FA2 by default on GPTQModel for inference of gptq quantized models and our CI tests are failing for multiple models. This looks like a regression in the
fa2
attention code whereseq_len
is never set ifmask is None
. FA2 forward requiresseq_len
:transformers/src/transformers/models/cohere2/modeling_cohere2.py
Lines 235 to 270 in fc74e39
@SunMarc I don't think this is related to quantization and @ArthurZucker The FA2 code above is broken if
mask
is not passed orNone
asseq_len
will never be set. Themask
param is explicitly declared asOptional
.Expected behavior
Work and not crash.
The text was updated successfully, but these errors were encountered: