-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make modeling compatible with Nanotron + few optims #23
Make modeling compatible with Nanotron + few optims #23
Conversation
matching logits without using cache
[Feature] Converting brrr's starcoder to transformer's starcoder checkpoint format
Models converted from fast-llm use the branch |
Thanks for the great work. I am not sure if the following assertion was correct. When I tried to train the model and fed it with an attention mask where only the first few tokens are masked (e.g., transformers/src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py Lines 439 to 441 in 1507798
|
Closed in favor of #28 |
Convert nanotron checkpoint to transformers
Run inference
Please make sure
flash-attn>=2.4.2
cc @loubnabnl @xrsrke