Automated Benchmarking #324
Replies: 2 comments 1 reply
-
This shouldn't be too hard to achieve via the API, and a simple batch script could trigger generations with cURL and measure and record the timings taken. There's also a Here's a functional sample .bat file for this:
|
Beta Was this translation helpful? Give feedback.
-
Thank you, @LostRuins, that's very helpful! I could use your code example to run inference from the (WSL, bash) command line. Could you expose Total T/s on the perf endpoint, too? The Processing and Generation stats are there, so only the third (and probably most important one) is missing. |
Beta Was this translation helpful? Give feedback.
-
I'd love to have an automated benchmarking feature like with llama.cpp where we can specify a prompt and have it run a number of times, e. g. generate 100 times and calculate average runtimes (and ideally even perplexity).
Then we could more easily make meaningful performance (and quality) comparisons. Running tests manually through a UI takes too long to run as often as necessary for useful results.
Another, IMHO better, option would be to give koboldcpp a text file as an input option that contains the whole Input string that's output to the console. That way we'd be able to create a sample file with settings and context as we want, and have that run a number of times and averages calculated automatically.
Beta Was this translation helpful? Give feedback.
All reactions