SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

alexdauenhauer · 2025-01-02T17:09:15Z

System Info

Platform: Linux-5.15.0-1061-gke-x86_64-with-glibc2.31
Python version: 3.11.9
PyTorch version: 2.4.0
CUDA device(s): NVIDIA A100-SXM4-80GB
Transformers version: 4.46.3
Accelerate version: 1.0.1
Accelerate config: not found
Datasets version: 3.0.2
HF Hub version: 0.27.0
TRL version: 0.12.1
bitsandbytes version: 0.44.1
DeepSpeed version: not installed
Diffusers version: not installed
Liger-Kernel version: not installed
LLM-Blender version: not installed
OpenAI version: 1.58.1
PEFT version: 0.13.2

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

difference in how each trainer handles PEFT + FSDP

sft:
https://github.com/huggingface/trl/blob/v0.12.1/trl/trainer/sft_trainer.py#L242-L244

dpo:
https://github.com/huggingface/trl/blob/v0.12.1/trl/trainer/dpo_trainer.py#L363

Expected behavior

Currently workflow is:

create PEFT model outside of trainer
pass PEFT model to trainer
first run SFTTrainer
use output model from SFTTrainer as base model in DPOTrainer

it is unclear what is the expected way to create and pass a PEFT model to the trainer when also using FSDP for model parallel training since both SFTTrainer and DPOTrainer handle this differently.

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

edbeeching · 2025-01-20T08:55:45Z

Thanks for highlighting this difference. Is your current work impacted by this? Or is it just that you are required to instantiate the peft model outside of the training?

August-murr added 🏋 SFT Related to SFT ⚡ PEFT Related to PEFT 🚀 deepspeed Related to deepspeed labels Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

alexdauenhauer commented Jan 2, 2025

edbeeching commented Jan 20, 2025

SFTTrainer explicitly skips prepare_model_for_kbit_training if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

SFTTrainer explicitly skips prepare_model_for_kbit_training if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

Comments

alexdauenhauer commented Jan 2, 2025

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

edbeeching commented Jan 20, 2025

SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537