-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO example not working with DeepSpeed Stage 3 or FSDP #1051
Comments
I met the same problem, any idea to solve this? |
OK, so I've been doing more digging, and here's what I've found so far. The following is all with the
Possibly relevant discussion here - what's interesting is that for them it also seems to happen when training after a forward call.
That's also discussed e.g. here or here. Some comments in that discussion mention that running a forward pass through the model first solved the problem for them, but I haven't been able to make that work (yet). That's about as far as I got so far. |
|
ps:
|
Hm, that could be that it's the same issue I encountered, just for you it takes so long it triggers a timeout. Maybe there's a way to increase timeouts? Or try stage 3 with a smaller batch size, e.g. 4?
Interesting, I haven't run into that, but it's curious that the error happens at the exact same line as the FSDP error! Do you maybe need a
Shared layers between what? Actor and critic? Or are you somehow sharing layers between the actor-critic model and the reward model?
For FSDP, I actually have it sort-of working now. Indeed calling
I have noticed bad training metrics as well, both with the original example script as well as with my own llama-2 version, e.g. KL divergence goes negative in both very quickly. I haven't looked into it further yet, but I would guess that the PPO hyperparameters in the training script might be completely off. Again, haven't looked into it though.
Yes, agree. In my own testing, it seems 99% of the time each iteration is spent generating, that's clearly not good. I imagine there must be some bug with sharded generation, but I don't know much more than that right now. |
I think it is the same question that we have met. It very wired for the first iteration which take a very long time. I think what you observed could be the problem that I met. As you have pointed:
For lora tuning, I just used the default code in ppo.py, as follows:
I think the ppo_trainer.generate may automatically process the input tensors, so I did not handle them manually. As you pointed, maybe I could try: .to(ppo_trainer.accelerator.device)
In addition, I have a question about the training model and the ref_model. I have not figure out if the parameters of the ref_model is updated or not in the training process. As far as I'm concerned, the training model's parameters are updated in the training. But for the ref_model, I do not know if the ref_model's parameters are updated or not in the training. Can you tell me your opinion on this question? I think if the ref_model is frozen, we can speed up the ref_model's generation. |
We have some preliminary tests working with 7B models using PPO. Can you try running |
@vwxyzjn I've been using your script for a specific use case, and it proceeded to the training phase —thank you for providing it. However, I cannot find ways to safely save the model at certain checkpoints/ after some epochs/ at the end using The file does not provide a direct way to save models after certain epochs, so I manually try to save the model. When I use either
Subsequently, when I attempt to reload that saved model using
This suggests that the model is not being saved correctly. Could you provide any guidance on how to ensure the entire model is saved and can be properly reloaded? My code for saving the model to a local directory is as straightforward as the commands mentioned above. The accelerate script I ran is exactly like the above, and I modified Thank you for your guidance.
|
Hi @vwxyzjn - thanks so much for looking into this, and sorry it's taken me a while to reply. I've finally had time to try this out, and unfortunately, this still fails for me! In fact, the script fails even with the default model when using deepspeed for me (i.e. running the command you suggested, but without setting either edit: removed logs as they are not relevant, see next comment. |
Further update: After a lot more digging, it turns out the NCCL timeout in Deepspeed Stage 2 was the same issue as #1103, and is fixed by #1177. I'm still seeing issues with Deepspeed Stage 3 though, e.g. the |
Just to confirm, does deepspeed stage 3 work for you at the moment? I had weird experiences with stage 3 as well, saw stage 2 should be sufficient, as https://github.com/OpenLMLab/MOSS-RLHF can fit in a 6.9B critic, reward, policy, and ref policy model. Also, what's your hardware settings? |
Ah, ideally I'd like to scale up to 13B and maybe even 70B models though. Re "is stage 3 working" - yes! Up until last night it always crashed after a few steps with that The remaining issue is that it's slow - around 6-8 times as long per iteration compared to stage 2. That's just with the default model in the ppo example script, and 2 or 4 NVLinked GPUs. I realise of course that it will be somewhat slower than stage 2, but is that big of a performance drop expected? Re hardware: Varies, but the main configuration I've been using has 4x A100 80GB SXM4 per node, and 200Gbps Infiniband between nodes. For the 7B model I've tested on a node with 8x A100 80GB SXM4 GPUs. |
Hello, I am facing the same error in deepspeed zero3, |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I'm seeing the same issue with Deepspeed ZeRO 3. |
I've been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling
trainer.generate()
when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. pytorch/pytorch#82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?
To reproduce, inside
examples
:accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py
or even
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py
The text was updated successfully, but these errors were encountered: