About the implementation of 'lr_mul' trick #52

mikiyukio · 2021-03-12T10:14:34Z

Hello! I meet a problem when i am reading your code; Specifically, that is the implementation of 'lr_mul' trick.
your implementation of it is as follows:
param_groups = [ {'params': base_params, 'lr_mult': 0.0}, {'params': new_params, 'lr_mult': 1.0}]

And in the open-source code of your MS loss :
the implementation of it is as follows:
def build_optimizer(cfg, model): params = [] for key, value in model.named_parameters(): if not value.requires_grad: continue lr_mul = 1.0 if "backbone" in key: lr_mul = 0.1 params += [{"params": [value], "lr_mul": lr_mul}] optimizer = getattr(torch.optim, cfg.SOLVER.OPTIMIZER_NAME)(params, lr=cfg.SOLVER.BASE_LR, weight_decay=cfg.SOLVER.WEIGHT_DECAY) return optimizer

but in https://pytorch.org/docs/master/optim.html, the implementation of 'lr_mul' is as follows
optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

I want to know is there any difference between 'lr_mul' and 'lr_mult'? I can find any information of that. And in the document of pytorch framework, I cannot find any explain of 'lr_mul' and 'lr_mult'. (I find that the 'lr_mult' is used in caffe framework.) I'm confused of this problem and I need your help.
Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation of 'lr_mul' trick #52

About the implementation of 'lr_mul' trick #52

mikiyukio commented Mar 12, 2021

About the implementation of 'lr_mul' trick #52

About the implementation of 'lr_mul' trick #52

Comments

mikiyukio commented Mar 12, 2021