You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Weight shape: torch.Size([2048, 4096])
LoRA Right Weight shape: torch.Size([4096, 32])
LoRA Left Weight shape: torch.Size([32, 1024])
the dimension of these three matrices doesn't match. The multiplication of LoRA left and right weight should produce a matrix that has the same shape of the original weight matrix.
Without your implementation, I guess the shape of your 'LoRA Left Weight shape' is supposed to be [32, 1024].
Describe the bug
It happens during lora of tuning llama3.1 8b, is that deepspeed chat support llama3.1?
Log output
Input shape: torch.Size([1, 1088, 4096])
Weight shape: torch.Size([4096, 4096])
LoRA Right Weight shape: torch.Size([4096, 32])
LoRA Left Weight shape: torch.Size([32, 4096])
Result shape: torch.Size([1, 1088, 4096])
Input shape: torch.Size([1, 1088, 4096])
Weight shape: torch.Size([2048, 4096])
LoRA Right Weight shape: torch.Size([4096, 32])
LoRA Left Weight shape: torch.Size([32, 1024])
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/lgh/models/trustworthy-alignment/training/main.py", line 768, in
[rank0]: main()
[rank0]: File "/home/lgh/models/trustworthy-alignment/training/main.py", line 572, in main
[rank0]: out = trainer.generate_experience(batch_prompt['prompt'],
[rank0]: File "/home/lgh/models/trustworthy-alignment/dschat/rlhf/ppo_trainer.py", line 232, in generate_experience
[rank0]: output = self.actor_model(seq, attention_mask=attention_mask)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank0]: outputs = self.model(
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 913, in forward
[rank0]: layer_outputs = decoder_layer(
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 640, in forward
[rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 399, in forward
[rank0]: key_states = self.k_proj(hidden_states)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lgh/anaconda3/envs/rltmp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lgh/models/trustworthy-alignment/dschat/utils/module/lora.py", line 123, in forward
[rank0]: result = F.linear(input, self.weight, self.bias) +
[rank0]: RuntimeError: The size of tensor a (2048) must match the size of tensor b (1024) at non-singleton dimension 2
The text was updated successfully, but these errors were encountered: