state_dict keys don't match load_state_dict #865

samuele-bortolato · 2023-10-12T15:34:17Z

I'm using the AutoModelForCausalLMWithValueHead with a custom training loop, I was trying to copy the weights of the model to a target model with the same architecture when I noticed that the keys from state_dict don't match those for load_state_dict

model.load_state_dict(model.state_dict())

RuntimeError: Error(s) in loading state_dict for AutoModelForCausalLMWithValueHead:
	Missing key(s) in state_dict: "pretrained_model.base_model.model.model.decoder.embed_tokens.weight", ....
	Unexpected key(s) in state_dict: "base_model.model.model.decoder.embed_tokens.weight", ....

apparently it's just a key error, and modifying the state_dict function solves it

def state_dict(self, *args, **kwargs):
    r"""
    Returns the state dictionary of the model. We add the state dictionary of the value head
    to the state dictionary of the wrapped model by prepending the key with `v_head.`.
    """

    state_dict = {}

    if not self.is_peft_model:
        pretrained_model_state_dict = self.pretrained_model.state_dict(*args, **kwargs)
    else:
        # if it is a peft model, only save the v_head
        pretrained_model_state_dict = {}
    for k, v in pretrained_model_state_dict.items():
        state_dict[f"pretrained_model.{k}"] = v

    v_head_state_dict = self.v_head.state_dict(*args, **kwargs)
    for k, v in v_head_state_dict.items():
        state_dict[f"v_head.{k}"] = v

    return state_dict

I don't know if this was done to make some high level api work, but I feel like the basic api from pytorch should still be made compatible

The text was updated successfully, but these errors were encountered:

lvwerra · 2023-10-13T14:46:33Z

I remember @younesbelkada had to work on that to make the models work in trl. Maybe this could be fixed, what do you think?

younesbelkada · 2023-10-30T17:13:08Z

It looks like you are trying to load from a peft model state dict? in that case you only need to load the v_head as all other parameters are kept untouched right?

model.load_state_dict(model.state_dict(), strict=False)

Should do the trick

samuele-bortolato · 2023-11-07T09:38:34Z

Actually I think all parameters are trained, not only the v_head, since I'm using the language modelling as the policy (im not doing RLHF following the tutorials, I'm doing research on RL with LLMs, I have a custom training loop).
To make it work I ended up completely overwriting the method substituting it with the original state_dict() from the torch Module

import torch
from trl import AutoModelForCausalLMWithValueHead

AutoModelForCausalLMWithValueHead.state_dict = torch.nn.Module.state_dict

I'm now wondering what was the reason to change it in the first place.

github-actions · 2023-12-01T15:05:26Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions bot closed this as completed Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state_dict keys don't match load_state_dict #865

state_dict keys don't match load_state_dict #865

samuele-bortolato commented Oct 12, 2023

lvwerra commented Oct 13, 2023

younesbelkada commented Oct 30, 2023

samuele-bortolato commented Nov 7, 2023

github-actions bot commented Dec 1, 2023

state_dict keys don't match load_state_dict #865

state_dict keys don't match load_state_dict #865

Comments

samuele-bortolato commented Oct 12, 2023

lvwerra commented Oct 13, 2023

younesbelkada commented Oct 30, 2023

samuele-bortolato commented Nov 7, 2023

github-actions bot commented Dec 1, 2023