You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something that was asked for previously with Judge (and also structured output adherence).
I feel like this is a prefect time to add (for rejection sampling with verifier)
Why is this helpful.
Because if you don't, you have to wait until the end of the run to verify, filter, and start a new job
And you could immediately know when you got the request back (+ verify time) and sent that request concurrently
This means a speed up for repeated tries. (You don't have to wait for the straggler calls. Note on that - stragglers are due to the number of tokens generated which is non-deterministic and changes when you do repeated sampling).
Rejected samples can written to a separate dataset.arrow file (so we just don't throw away like structured output failing).
Viewing these rejected samples in the curator-viewer would be very useful.
We can also report rejection rates on the CLI and in the viewer.
The text was updated successfully, but these errors were encountered:
@shreyaspimpalgaonkar says that verifier as an abstraction doesn't really make sense since it is very different for math and code (code verification is a heavy lift)
This is much more intensive and has much more overhead than the simple parsing check we do right now.
Also this is related to what @kartik4949 was suggesting in terms of letting individual requests pass through curator calls without blocking for all of them to finish
Agreed we should just measure how bad this is. Let's do the dumb this first. The only reason I'm mentioning this optimization now is because with reasoning models, response generation time is long and exacerbates this issue.
This is something that was asked for previously with Judge (and also structured output adherence).
I feel like this is a prefect time to add (for rejection sampling with verifier)
Why is this helpful.
Because if you don't, you have to wait until the end of the run to verify, filter, and start a new job
And you could immediately know when you got the request back (+ verify time) and sent that request concurrently
This means a speed up for repeated tries. (You don't have to wait for the straggler calls. Note on that - stragglers are due to the number of tokens generated which is non-deterministic and changes when you do repeated sampling).
Rejected samples can written to a separate dataset.arrow file (so we just don't throw away like structured output failing).
Viewing these rejected samples in the curator-viewer would be very useful.
We can also report rejection rates on the CLI and in the viewer.
The text was updated successfully, but these errors were encountered: