GPT4 prompt when evaluating DPO #88

kygguo · 2024-09-05T06:38:52Z

Thanks for sharing the amazing repo!

The GPT-4 win rate prompt stated in the paper is attached below. As HH dataset concerns both helpful and harmless, I wonder why only helpful is considered when evaluating models, is there any special consideration regarding this?

Dialogue GPT-4 win rate prompt.
For the following query to a chatbot, which response is more helpful?
Query: <the user query>
Response A:
<either the test method or baseline>
Response B:
<the other response>
FIRST provide a one-sentence comparison of the two responses and explain \
which you feel is more helpful. SECOND, on a new line, state only "A" or \
"B" to indicate which response is more helpful. Your response should use \
the format:
Comparison: <one-sentence comparison and explanation>
More helpful: <"A" or "B">

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT4 prompt when evaluating DPO #88

GPT4 prompt when evaluating DPO #88

kygguo commented Sep 5, 2024 •

edited

Loading

GPT4 prompt when evaluating DPO #88

GPT4 prompt when evaluating DPO #88

Comments

kygguo commented Sep 5, 2024 • edited Loading

kygguo commented Sep 5, 2024 •

edited

Loading