Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference of these two saved checkpoints in sft_llama2 example? #868

Closed
Emerald01 opened this issue Oct 13, 2023 · 3 comments
Closed

Comments

@Emerald01
Copy link

I am trying to understand this
https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama_2/scripts/sft_llama2.py#L206C1-L206C1

trainer.model.save_pretrained(output_dir) seems already saves the base+lora model to the "final_checkpoint".
Then what is doing here model = model.merge_and_unload() and save it again to "final_merged_checkpoint"?

trainer.save_model(script_args.output_dir)

output_dir = os.path.join(script_args.output_dir, "final_checkpoint")
trainer.model.save_pretrained(output_dir)

# Free memory for merging weights
del base_model
torch.cuda.empty_cache()

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, device_map="auto", torch_dtype=torch.bfloat16)
model = model.merge_and_unload()

output_merged_dir = os.path.join(script_args.output_dir, "final_merged_checkpoint")
model.save_pretrained(output_merged_dir, safe_serialization=True)
@younesbelkada
Copy link
Contributor

Hi @Emerald01

trainer.model.save_pretrained(output_dir) should save the adapter weights only, in the code snippet you shared we first load the adapters + base model given the adapter weights path, perform merging and save the merged model in output_merged_dir

@Emerald01
Copy link
Author

Yes, I think it only saves the adapter weights. However, when calling model = AutoPeftModelForCausalLM.from_pretrained(output_dir), it actually finds the base model path and it is able to merge the adapter to the base at that moment, since the base model path is known (like llama2_7b).

My question is: why bother to save the entire weights of the base + adapter to "final_merged_checkpoints" in the disk then? And going to the DPO stage, I see the loading path of model = AutoPeftModelForCausalLM.from_pretrained() is actually the adapter-only one, i.e., "final_checkpoint", so I do not see any use case for the "final_merged_checkpoints"

@younesbelkada
Copy link
Contributor

My question is: why bother to save the entire weights of the base + adapter to "final_merged_checkpoints" in the disk then?

this is useful for deployment as it is preferrable to use a merged model for deployment to avoid the overheads introduced in LoRA during inference. There is no point keeping LoRA weights if you know that you are going to use the model for inference only. Let me know if that answers the question and feel free to re-open the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants