You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I tried to adapt this example script to use device='auto', to support a larger model. https://github.com/huggingface/trl/blob/main/examples/scripts/ppo_multi_adapter.py
Unfortunately it fails on the compute_reward_score step. (the generate step does work).
The error using llama-13b on 4 A100s is below.
I'm running it using naive parallelism.
I believe it's because set_adapter isn't correctly loading the weights onto the GPU?
Any help/comments would be greatly appreciated.
The error:
raw_rewards = ppo_trainer.model.compute_reward_score(**inputs)
File "/usr/local/lib/python3.8/dist-packages/trl/models/modeling_base.py", line 498, in compute_reward_score
base_model_output = self.pretrained_model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 977, in forward
return self.base_model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/tuners/tuners_utils.py", line 106, in forward
return self.model.forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 708, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 424, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/llama/modeling_llama.py", line 321, in forward
query_states = self.q_proj(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora/bnb.py", line 290, in forward
output = lora_B(lora_A(dropout(x)))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1533, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
The text was updated successfully, but these errors were encountered:
I was able to solve this by looping through the named parameters and putting the reward model adapter on the correct gpu instance.
If you look at the original devices of all the parameters, you see the default adapter and the base model layers on the right machines, but the reward model adapter on the CPU, and set_adapter doesn't correctly place the model adapters.
The code I run before the train loop is:
ppo_trainer.model.pretrained_model.set_adapter(ppo_trainer.model.rm_adapter_name)
ppo_trainer.model.pretrained_model.eval()
previous_device = None
for name, module in ppo_trainer.model.pretrained_model.named_parameters():
if str(module.device) == "cpu":
module.data = module.data.to(previous_device)
previous_device = module.device
ppo_trainer.model.pretrained_model.set_adapter("default")
ppo_trainer.model.pretrained_model.train()
Hi, I tried to adapt this example script to use device='auto', to support a larger model.
https://github.com/huggingface/trl/blob/main/examples/scripts/ppo_multi_adapter.py
Unfortunately it fails on the compute_reward_score step. (the generate step does work).
The error using llama-13b on 4 A100s is below.
I'm running it using naive parallelism.
I believe it's because set_adapter isn't correctly loading the weights onto the GPU?
Any help/comments would be greatly appreciated.
The error:
The text was updated successfully, but these errors were encountered: