Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GaLore_AdamW和LoRA的应用 #6487

Closed
YajieW99 opened this issue Dec 30, 2024 · 4 comments
Closed

GaLore_AdamW和LoRA的应用 #6487

YajieW99 opened this issue Dec 30, 2024 · 4 comments
Labels
solved This problem has been already solved

Comments

@YajieW99
Copy link

作者你好,注意到你在训练阶段所使用的优化器为GaLore_AdamW,了解到这是一种梯度低秩近似的方法,GaLore论文中作者将GaLore和LoRA作为两种独立方法进行应用和比较,那么在llama-factory框架下,GaLore_AdamW和LoRA两者间你是如何结合并应用的呢?

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 30, 2024
@hiyouga
Copy link
Owner

hiyouga commented Dec 30, 2024

两个不需要一起用

@hiyouga hiyouga closed this as completed Dec 30, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 30, 2024
@YajieW99
Copy link
Author

两个不需要一起用

那请问默认配置下,是用的LoRA+AdamW优化器吗?或者是别的什么?

@hiyouga
Copy link
Owner

hiyouga commented Dec 30, 2024

@YajieW99
Copy link
Author

我在训练后生成的README中注意到,优化器使用的是adamw_torch,而adamw_torch在代码中对应为GaLore_AdamW,这个应该怎么理解?麻烦大佬把LoRA和优化器的使用讲清楚些

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants