【DPO相关】请问DPO训练结束后,应该如何进行推理呢?能使用类似SFT训练后推理的脚本吗 #1979
Annotations
1 warning
label_issue
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
|