Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add docs for local accuracy tests #2953

Open
2 tasks
zhaochenyang20 opened this issue Jan 17, 2025 · 9 comments
Open
2 tasks

[Feature] Add docs for local accuracy tests #2953

zhaochenyang20 opened this issue Jan 17, 2025 · 9 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@zhaochenyang20
Copy link
Collaborator

Checklist

Motivation

Following this #2951 (comment)

In our test files of backend, /test/srt, some of the tests take a great many of time and can't be triggered in our CI for every commits. But some contributors want to change some of the codes, directly related to accuracy. It's better for them to test accuracy that is not covered in CI and report the results. Related tests are:

export models args
python3 test/srt/test_eval_accuracy_mini.py

Related resources

No response

@zhaochenyang20 zhaochenyang20 added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jan 17, 2025
@zhaochenyang20 zhaochenyang20 self-assigned this Jan 17, 2025
@ispobock
Copy link
Collaborator

Some useful tools can be used for accuracy test:

@zhaochenyang20
Copy link
Collaborator Author

Great. We should build a pipeline for this, not only docs.

@simveit
Copy link
Contributor

simveit commented Jan 20, 2025

@zhyncs

If it is ok for you I can take this issue but I have some questions:

  • Should there be a canonical way of providing the results for dataset X, Y and Z? Or should it just tailored to the use case? In case 2 should there still be some kind of second check on other datasets to avoid overfitting a certain benchmark?
  • Should the doc focus on explaining how to measure accuracy or also include latency?
  • Should we cover advanced evaluation techniques like LLM as a judge in the doc?
  • For the Qwen Math models I remember they reported not only one pass but metrics like ACC with K runs. Maybe this should be preferred to report instead of just running a benchmark once (which has risk of lucky trial)?

@zhaochenyang20
Copy link
Collaborator Author

@ispobock could you also help with this?

@ispobock
Copy link
Collaborator

Hi @simveit , thanks for taking this!

  • We can just provide the available methods and usage guide. Contributors can choose the methods they familiar with.
  • Latency can be involved in benchmark doc. Not in this accuracy test doc.
  • Advanced evaluation techniques is good but not necessary. If you are familiar with them, you can add some.

@simveit
Copy link
Contributor

simveit commented Jan 21, 2025

hi @ispobock thanks for the reply.
do you mean by available methods and usage guide the tools you mentioned above or should the guide focus on using only the scripts we can find in sglang benchmark and how to possibly modify them to your need?
should that be done in a notebook or rather in markdown and stylewise like the guide for latency?

@zhaochenyang20
Copy link
Collaborator Author

@zhyncs Yineng, what's your opinion?

@zhaochenyang20
Copy link
Collaborator Author

should that be done in a notebook or rather in markdown and stylewise like the guide for latency?

I think a markdown is okay.

Thanks Simon!

@ispobock
Copy link
Collaborator

do you mean by available methods and usage guide the tools you mentioned above or should the guide focus on using only the scripts we can find in sglang benchmark and how to possibly modify them to your need?

In most cases, the SGLang benchmark scripts is the most convenient and effective way for accuracy test. While some of the users may be familiar with other tools I mentioned above. We can involve brief usage introduction for them as a guide, and focus on the SGLang scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants