[Feature] Add docs for local accuracy tests #2953

zhaochenyang20 · 2025-01-17T17:26:41Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

In our test files of backend, /test/srt, some of the tests take a great many of time and can't be triggered in our CI for every commits. But some contributors want to change some of the codes, directly related to accuracy. It's better for them to test accuracy that is not covered in CI and report the results. Related tests are:

export models args
python3 test/srt/test_eval_accuracy_mini.py

Related resources

No response

The text was updated successfully, but these errors were encountered:

ispobock · 2025-01-18T03:59:57Z

Some useful tools can be used for accuracy test:

SGLang evaluation scripts, such as gsm8k, mmlu.
OpenCompass.
lm_eval.

zhaochenyang20 · 2025-01-18T05:36:35Z

Great. We should build a pipeline for this, not only docs.

simveit · 2025-01-20T17:13:58Z

@zhyncs

If it is ok for you I can take this issue but I have some questions:

Should there be a canonical way of providing the results for dataset X, Y and Z? Or should it just tailored to the use case? In case 2 should there still be some kind of second check on other datasets to avoid overfitting a certain benchmark?
Should the doc focus on explaining how to measure accuracy or also include latency?
Should we cover advanced evaluation techniques like LLM as a judge in the doc?
For the Qwen Math models I remember they reported not only one pass but metrics like ACC with K runs. Maybe this should be preferred to report instead of just running a benchmark once (which has risk of lucky trial)?

zhaochenyang20 · 2025-01-20T19:30:45Z

@ispobock could you also help with this?

ispobock · 2025-01-21T04:45:24Z

Hi @simveit , thanks for taking this!

We can just provide the available methods and usage guide. Contributors can choose the methods they familiar with.
Latency can be involved in benchmark doc. Not in this accuracy test doc.
Advanced evaluation techniques is good but not necessary. If you are familiar with them, you can add some.

simveit · 2025-01-21T16:34:17Z

hi @ispobock thanks for the reply.
do you mean by available methods and usage guide the tools you mentioned above or should the guide focus on using only the scripts we can find in sglang benchmark and how to possibly modify them to your need?
should that be done in a notebook or rather in markdown and stylewise like the guide for latency?

zhaochenyang20 · 2025-01-21T18:35:51Z

@zhyncs Yineng, what's your opinion?

zhaochenyang20 · 2025-01-21T18:36:41Z

should that be done in a notebook or rather in markdown and stylewise like the guide for latency?

I think a markdown is okay.

Thanks Simon!

ispobock · 2025-01-22T02:16:34Z

do you mean by available methods and usage guide the tools you mentioned above or should the guide focus on using only the scripts we can find in sglang benchmark and how to possibly modify them to your need?

In most cases, the SGLang benchmark scripts is the most convenient and effective way for accuracy test. While some of the users may be familiar with other tools I mentioned above. We can involve brief usage introduction for them as a guide, and focus on the SGLang scripts.

zhaochenyang20 added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jan 17, 2025

zhaochenyang20 self-assigned this Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add docs for local accuracy tests #2953

[Feature] Add docs for local accuracy tests #2953

zhaochenyang20 commented Jan 17, 2025

ispobock commented Jan 18, 2025

zhaochenyang20 commented Jan 18, 2025

simveit commented Jan 20, 2025 •

edited

Loading

zhaochenyang20 commented Jan 20, 2025

ispobock commented Jan 21, 2025

simveit commented Jan 21, 2025 •

edited

Loading

zhaochenyang20 commented Jan 21, 2025

zhaochenyang20 commented Jan 21, 2025

ispobock commented Jan 22, 2025

[Feature] Add docs for local accuracy tests #2953

[Feature] Add docs for local accuracy tests #2953

Comments

zhaochenyang20 commented Jan 17, 2025

Checklist

Motivation

Related resources

ispobock commented Jan 18, 2025

zhaochenyang20 commented Jan 18, 2025

simveit commented Jan 20, 2025 • edited Loading

zhaochenyang20 commented Jan 20, 2025

ispobock commented Jan 21, 2025

simveit commented Jan 21, 2025 • edited Loading

zhaochenyang20 commented Jan 21, 2025

zhaochenyang20 commented Jan 21, 2025

ispobock commented Jan 22, 2025

simveit commented Jan 20, 2025 •

edited

Loading

simveit commented Jan 21, 2025 •

edited

Loading