KeyError in load_model() #717

ManuelZ · 2024-09-16T22:57:34Z

This line is generating a RuntimeError:

RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.conv1.weight", "module.bn1.weight", ...

This happened after running this example, replacing trainer.train(num_epochs=num_epochs) by;

start_epoch = hooks.load_latest_saved_models(trainer, model_folder, device)
trainer.train(start_epoch, num_epochs=num_epochs)

And running trainer.train(start_epoch, num_epochs=num_epochs) a second time.

Also note that I ran the example:

Without pip install pytorch-metric-learning[with-hooks] because that downgraded my PML.
With dataloader_num_workers=0.

PyTorch version: 2.4.1+cu121
PyTorch Metric Learning version: 2.6.0

The text was updated successfully, but these errors were encountered:

ManuelZ · 2024-09-16T23:40:07Z

I guess my use of dataloader_num_workers=0 is causing the keys of the weights dictionary to be saved without the module. part, and since the trunk and embedder are wrapped in torch.nn.DataParallel, model_def.load_state_dict expects a dictionary with keys prepended with module..

My problem is solved by not using torch.nn.DataParallel, but I don't know if this is a behavior that could be fixed in PML or if I was just using it incorrectly.

KevinMusgrave · 2024-10-15T13:46:08Z

I'm not sure what's causing this. I recommend using another library for the training loop, like Lightning, Ignite, Transformers, or Timm. You can also look at Open Metric Learning which I think has training loops that are compatible with many parts of this library.

ManuelZ changed the title ~~load_model() catches KeyError instead of RuntimeError~~ KeyError in load_model() Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError in load_model() #717

KeyError in load_model() #717

ManuelZ commented Sep 16, 2024 •

edited

Loading

ManuelZ commented Sep 16, 2024

KevinMusgrave commented Oct 15, 2024 •

edited

Loading

KeyError in load_model() #717

KeyError in load_model() #717

Comments

ManuelZ commented Sep 16, 2024 • edited Loading

ManuelZ commented Sep 16, 2024

KevinMusgrave commented Oct 15, 2024 • edited Loading

ManuelZ commented Sep 16, 2024 •

edited

Loading

KevinMusgrave commented Oct 15, 2024 •

edited

Loading