-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix backward error in stack llama2 DPO example if checkpointing is used #1056
Conversation
@pacman100 @younesbelkada please help review. I did not use load_in_4bit=True in AutoPeftModelForCausalLM.from_pretrained. |
I check the code, is_gradient_checkpointing is set in https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L1619 in trainer.train() after the AutoPeftModelForCausalLM.from_pretrained() |
The change in this PR is problematic because it means that PEFT will stop working with non-transformers models. It would also call
Maybe it's something that needs to be fixed in transformers then, WDYT @younesbelkada? |
Signed-off-by: Wang, Yi A <[email protected]>
f8fab8e
to
f584b57
Compare
Hi @BenjaminBossan, thanks for the review, how about change like this? the PR is updated. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hi @sywangyi |
I just set "gradient_checkpointing=True", and the error is shown like what I describe in the bug. also the default value of the gradient_checkpointing in this script is True. see https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama_2/scripts/dpo_llama2.py#L41. some other guy find the issue as well. see what's decribed in huggingface/trl#906 "I found that gradient checkpointing doesn't work, so I disabled it. I guess it's caused by PEFT, for gradient checkpointing works well with deepspeed version of SFT (in my private repo)." |
@sywangyi , I see, thanks! |
error disappears if we call model.enable_input_require_grads() before feeding it to DPOTrainer, the error is just because lack of enable_input_require_grads(), no matter it's called in trainer.train() or someplace else |
I wonder if we should change the line of - if (loaded_in_kbit or is_gptq_quantized) and use_gradient_checkpointing:
+if use_gradient_checkpointing: what is the reason to only enable that for quantized models cc @BenjaminBossan @pacman100 ? |
Good question, I don't know :) This seems to be the commit that added the logic: The commit message is:
but AFAICT this message does not correspond to the actual code change. Should we move enabling of gradient checkpointing out of this function, since it's not directly related to quantization? Or maybe just leave it completely up to the user whether to enable it or not? |
Agreed! Not performing |
Good point (I assume you mean backwards compatibility breaking). What I don't quite get is why we tie enabling gradients to whether gradient checkpointing is being used. We could also allow gradients on the input embedding without checkpointing, right? In that case, it would be up to the user to enable those gradients? Maybe, at the end of the day, it's also a matter of documenting this properly. |
Hi, @younesbelkada prepare_model_for_kbit_training is not called in the case(I am not using loaded_in_8bit or loaded_in_4bit). Also, I see logic in transformers like https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1877-L1882. That's why I change the logic in PR like this because other apps (LORA finetune) may have the same issue. In what scenario will these logics in transformers be used? |
Hi @sywangyi |
the PR in TRL fix my issue. @younesbelkada |
GReat! thanks very much @sywangyi ! |
No description provided.