请问全量微调hunyuan脚本是支持多机并行训练的吗？ #138

QingQingS · 2025-01-09T13:15:55Z

最近在尝试对hunyuanvideo进行微调，想请教下代码是能直接支持多机并行训练吗？训练数据的大小能支持720p 更多帧长吗？看到数据处理部分写的max_height=480,max_width=848,num_frames=93。

感谢作者的开源工作给了很多启发，谢谢！

BrianChen1129 · 2025-01-09T21:10:40Z

Our Huanyun full finetune supports multi GPU training, and you can use 720p(1280*720) and 125 frames

jzhang38 · 2025-01-09T23:29:57Z

You can support frames larger than 125 frames so long as: 1. you have long enough data. 2. you correctly set the num frames during preprocess. 3. You have enough cards to sufficiently shard the sequence in context parallel.

QingQingS · 2025-01-10T02:38:13Z

嗯嗯，多谢，还有个问题，提前预处理VAE和text encoder的方式占用的存储成本太高，我接下来想将VAE模块添加回来，但这样又会消耗掉部分显存，只采用FSDP的方式或许不太够，所以我想咨询下能否参考pytorch官方 TP部分的教程，在这套代码上直接添加ColwiseParallel，RowwiseParallel这些TP操作，在添加的过程中需要注意些什么？因为教程实在过于简单，还没有找到其他更好的参考资料

https://pytorch.org/tutorials/intermediate/TP_tutorial.html

zhuhz22 · 2025-01-14T12:31:18Z

嗯嗯，多谢，还有个问题，提前预处理VAE和text encoder的方式占用的存储成本太高，我接下来想将VAE模块添加回来，但这样又会消耗掉部分显存，只采用FSDP的方式或许不太够，所以我想咨询下能否参考pytorch官方 TP部分的教程，在这套代码上直接添加ColwiseParallel，RowwiseParallel这些TP操作，在添加的过程中需要注意些什么？因为教程实在过于简单，还没有找到其他更好的参考资料

https://pytorch.org/tutorials/intermediate/TP_tutorial.html

Hi @QingQingS , may I ask that have you solved this issue? And have you already tried full-finetuning using the code in this repo? I tried full-finetuning it with bs=8, yet it seems that the performance of the model severely declined in 200 steps. So if you have tried to finetune, could you please share whether your training was successful so that I can determine whether the issue is due to a small batch size or related to the code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问全量微调hunyuan脚本是支持多机并行训练的吗？ #138

请问全量微调hunyuan脚本是支持多机并行训练的吗？ #138

QingQingS commented Jan 9, 2025

BrianChen1129 commented Jan 9, 2025

jzhang38 commented Jan 9, 2025 •

edited

Loading

QingQingS commented Jan 10, 2025

zhuhz22 commented Jan 14, 2025 •

edited

Loading

请问全量微调hunyuan脚本是支持多机并行训练的吗？ #138

请问全量微调hunyuan脚本是支持多机并行训练的吗？ #138

Comments

QingQingS commented Jan 9, 2025

BrianChen1129 commented Jan 9, 2025

jzhang38 commented Jan 9, 2025 • edited Loading

QingQingS commented Jan 10, 2025

zhuhz22 commented Jan 14, 2025 • edited Loading

jzhang38 commented Jan 9, 2025 •

edited

Loading

zhuhz22 commented Jan 14, 2025 •

edited

Loading