Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use this Resnet to fine-tune on other application ? such as face data #31

Open
bruinxiong opened this issue Feb 16, 2017 · 2 comments

Comments

@bruinxiong
Copy link

@tornadomeet Thank you for your Resnet implementation with MxNet. At present, I have completed train imagenet'12 dataset with Resnet 50 and Resnet 101. I can get the similar performance, although, I just use 4 GPU with 225 batch-size and different learning schedule for ResNet 101. The left curves are yours, the right curves are ours.
image

At present, I hope use this pre-trained model on imagenet'12 to fine-tune other application, such as face data. I reference someone's tips, I know the principle rule.
There are two choice, the one is to fix learning rate of all layers except the last fc layer, and set the learning rate of new fc layer as 0.1. If follow this idea, I can use the built-in api of Mxnet such as "fixed_param_names" when initializing a training module ( mod = mx.mod.Module(net, ......, fixed_param_names=fixed_param_names). In here, I have a question, do I need fixed all parameter layers ( including conv and bn) or just fixed conv layers ?

The other is that all other layers except the last fc layer can be set a smaller value for learning rate, such as 0.001, then we can set the learning rate of new fc layer is a larger one, such as 0.1. There is also the problem as mentioned, only conv layer or all layer is set smaller value. Furthermore, I check the Model API of MxNet, there is a method "set_lr_mult" for optimizer class. Use this method can set individual learning rate multiplier for parameters. However, when I check this source code of "set_lr_mult" method, the program would lookup the attributes in the whole network recursively when the "set_lr_mult()" method is called. But when I print symbol.list_attr() there is nothing. Based on my understanding, if the variables did not have such attributes, nothing would be done. The weights would be updated with the original learning rate. In other words, the set_lr_mult() method take no effects. The issue is coming, when I set attribute with mx.AttrScope in symbol_resnet.py file, even I set attribute for each operator such as mx.sym.Convolution(data=data, ......, attr={'lr_mult': '1'), there is still nothing when I print symbol.list_attr(). So, please tell me how to add attribute with correct way. If I obtain symbol from pretrained model, how can I add attribute for each parameter layers with easy way ?

Above questions are my problems when I code for fine-tune ResNet.

Looking forward your reply!
Thanks!

@tornadomeet
Copy link
Owner

hello, @bruinxiong pls ref to https://github.com/dmlc/mxnet-notebooks/tree/master/python/how_to

@ZhengHe-MD
Copy link

@tornadomeet , this doesn't answer @bruinxiong 's question at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants