We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者你好,注意到你在训练阶段所使用的优化器为GaLore_AdamW,了解到这是一种梯度低秩近似的方法,GaLore论文中作者将GaLore和LoRA作为两种独立方法进行应用和比较,那么在llama-factory框架下,GaLore_AdamW和LoRA两者间你是如何结合并应用的呢?
The text was updated successfully, but these errors were encountered:
两个不需要一起用
Sorry, something went wrong.
那请问默认配置下,是用的LoRA+AdamW优化器吗?或者是别的什么?
是
我在训练后生成的README中注意到,优化器使用的是adamw_torch,而adamw_torch在代码中对应为GaLore_AdamW,这个应该怎么理解?麻烦大佬把LoRA和优化器的使用讲清楚些
No branches or pull requests
作者你好,注意到你在训练阶段所使用的优化器为GaLore_AdamW,了解到这是一种梯度低秩近似的方法,GaLore论文中作者将GaLore和LoRA作为两种独立方法进行应用和比较,那么在llama-factory框架下,GaLore_AdamW和LoRA两者间你是如何结合并应用的呢?
The text was updated successfully, but these errors were encountered: