Model Parallelism with SFTTrainer #1094

pharringtonp19 · 2023-12-14T01:49:46Z

Perhaps this isn't the right place to ask this question, but what's the easiest way to setup model parallelism using SFTTrainer?

My understanding is that if we have access to a multi-gpu workstation, the default is data parallelism. However, I would be interested in comparing run times to model parallelism.

Thanks!

younesbelkada · 2023-12-20T19:37:36Z

hi @pharringtonp19
You can do model sequential parallelism with accelerate, simply load your model by passing device_map="auto" in from_pretrained. Note this is different from pipeline parallelism or model parallelism in the sense that the operations are going to be sequential, i.e. the layer n-1 will be kept idle while the layer n will be performing computation

pharringtonp19 · 2023-12-20T23:33:13Z

@younesbelkada Thanks! Is this the most efficient way to train across a cluster of small & old gpus (2080Ti)? I usually run out of memory.

lvwerra · 2023-12-21T16:09:14Z

In addition to using PEFT (quantization + LoRA) I would consider looking into DeepSpeed. It's fully supported with accelerate and allows you to shard the model over many GPUs and even offload to CPUs if you still run out of memory.

pharringtonp19 · 2023-12-21T20:32:23Z

@lvwerra Thanks for the suggestion!

pharringtonp19 closed this as completed Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Parallelism with SFTTrainer #1094

Model Parallelism with SFTTrainer #1094

pharringtonp19 commented Dec 14, 2023

younesbelkada commented Dec 20, 2023

pharringtonp19 commented Dec 20, 2023

lvwerra commented Dec 21, 2023

pharringtonp19 commented Dec 21, 2023

Model Parallelism with SFTTrainer #1094

Model Parallelism with SFTTrainer #1094

Comments

pharringtonp19 commented Dec 14, 2023

younesbelkada commented Dec 20, 2023

pharringtonp19 commented Dec 20, 2023

lvwerra commented Dec 21, 2023

pharringtonp19 commented Dec 21, 2023