Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPO数据格式问题,对于同一个instruction有多个chosen和rejected怎么构造数据? #6509

Closed
1 task done
veraygood opened this issue Jan 2, 2025 · 2 comments
Labels
solved This problem has been already solved

Comments

@veraygood
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

[
{
"instruction": "人类指令(必填)",
"input": "人类输入(选填)",
"chosen": "优质回答(必填)",
"rejected": "劣质回答(必填)"
}
]

Reproduction

--

Expected behavior

这里的chosen和rejected可以为list格式吗?我应该怎么构造数据呢

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 2, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 2, 2025

仅支持一个,多个请分成多条数据

@hiyouga hiyouga closed this as completed Jan 2, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 2, 2025
@veraygood
Copy link
Author

@hiyouga
还想请教一下,我有多个response分别对应质量打分,比如对于按照质量打分score排序 response a>b>c>d,以下哪种方式推荐?
方式一:按照score,取相对概念,比如b相比于a更差,b相比于c更好。
[
{
"instruction": prompt,
"chosen": a,
"rejected": b
},
{
"instruction": prompt,
"chosen": b,
"rejected": c
},...]
方式二:设定阈值,比如a>b>thresh>c>d,那么a、b为chosen,c、d为rejected
[
{
"instruction": prompt,
"chosen": a,
"rejected": c
},
{
"instruction": prompt,
"chosen": a,
"rejected": d
},...
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants