What is the difference of these two saved checkpoints in sft_llama2 example? #868

Emerald01 · 2023-10-13T04:31:57Z

I am trying to understand this
https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama_2/scripts/sft_llama2.py#L206C1-L206C1

trainer.model.save_pretrained(output_dir) seems already saves the base+lora model to the "final_checkpoint".
Then what is doing here model = model.merge_and_unload() and save it again to "final_merged_checkpoint"?

trainer.save_model(script_args.output_dir)

output_dir = os.path.join(script_args.output_dir, "final_checkpoint")
trainer.model.save_pretrained(output_dir)

# Free memory for merging weights
del base_model
torch.cuda.empty_cache()

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, device_map="auto", torch_dtype=torch.bfloat16)
model = model.merge_and_unload()

output_merged_dir = os.path.join(script_args.output_dir, "final_merged_checkpoint")
model.save_pretrained(output_merged_dir, safe_serialization=True)

The text was updated successfully, but these errors were encountered:

younesbelkada · 2023-10-13T09:08:39Z

Hi @Emerald01

trainer.model.save_pretrained(output_dir) should save the adapter weights only, in the code snippet you shared we first load the adapters + base model given the adapter weights path, perform merging and save the merged model in output_merged_dir

Emerald01 · 2023-10-13T16:47:21Z

Yes, I think it only saves the adapter weights. However, when calling model = AutoPeftModelForCausalLM.from_pretrained(output_dir), it actually finds the base model path and it is able to merge the adapter to the base at that moment, since the base model path is known (like llama2_7b).

My question is: why bother to save the entire weights of the base + adapter to "final_merged_checkpoints" in the disk then? And going to the DPO stage, I see the loading path of model = AutoPeftModelForCausalLM.from_pretrained() is actually the adapter-only one, i.e., "final_checkpoint", so I do not see any use case for the "final_merged_checkpoints"

younesbelkada · 2023-10-30T17:15:34Z

My question is: why bother to save the entire weights of the base + adapter to "final_merged_checkpoints" in the disk then?

this is useful for deployment as it is preferrable to use a merged model for deployment to avoid the overheads introduced in LoRA during inference. There is no point keeping LoRA weights if you know that you are going to use the model for inference only. Let me know if that answers the question and feel free to re-open the issue

younesbelkada closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference of these two saved checkpoints in sft_llama2 example? #868

What is the difference of these two saved checkpoints in sft_llama2 example? #868

Emerald01 commented Oct 13, 2023

younesbelkada commented Oct 13, 2023

Emerald01 commented Oct 13, 2023

younesbelkada commented Oct 30, 2023

What is the difference of these two saved checkpoints in sft_llama2 example? #868

What is the difference of these two saved checkpoints in sft_llama2 example? #868

Comments

Emerald01 commented Oct 13, 2023

younesbelkada commented Oct 13, 2023

Emerald01 commented Oct 13, 2023

younesbelkada commented Oct 30, 2023