Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive Parallelism Multi-GPU example script fails #993

Closed
johncookds opened this issue Nov 14, 2023 · 1 comment
Closed

Naive Parallelism Multi-GPU example script fails #993

johncookds opened this issue Nov 14, 2023 · 1 comment

Comments

@johncookds
Copy link

johncookds commented Nov 14, 2023

Hi, I tried to adapt this example script to use device='auto', to support a larger model.
https://github.com/huggingface/trl/blob/main/examples/scripts/ppo_multi_adapter.py
Unfortunately it fails on the compute_reward_score step. (the generate step does work).
The error using llama-13b on 4 A100s is below.
I'm running it using naive parallelism.
I believe it's because set_adapter isn't correctly loading the weights onto the GPU?
Any help/comments would be greatly appreciated.

The error:

    raw_rewards = ppo_trainer.model.compute_reward_score(**inputs)
  File "/usr/local/lib/python3.8/dist-packages/trl/models/modeling_base.py", line 498, in compute_reward_score
    base_model_output = self.pretrained_model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 977, in forward
    return self.base_model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/tuners/tuners_utils.py", line 106, in forward
    return self.model.forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 708, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 424, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 321, in forward
    query_states = self.q_proj(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora/bnb.py", line 290, in forward
    output = lora_B(lora_A(dropout(x)))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
@johncookds
Copy link
Author

I was able to solve this by looping through the named parameters and putting the reward model adapter on the correct gpu instance.
If you look at the original devices of all the parameters, you see the default adapter and the base model layers on the right machines, but the reward model adapter on the CPU, and set_adapter doesn't correctly place the model adapters.
The code I run before the train loop is:

    ppo_trainer.model.pretrained_model.set_adapter(ppo_trainer.model.rm_adapter_name)
    ppo_trainer.model.pretrained_model.eval()
    previous_device = None
    for name, module in ppo_trainer.model.pretrained_model.named_parameters():
        if str(module.device) == "cpu":
            module.data = module.data.to(previous_device)
        previous_device = module.device
    ppo_trainer.model.pretrained_model.set_adapter("default")
    ppo_trainer.model.pretrained_model.train()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant