Automated Benchmarking #324

WolframRavenwolf · 2023-07-18T10:39:36Z

WolframRavenwolf
Jul 18, 2023

I'd love to have an automated benchmarking feature like with llama.cpp where we can specify a prompt and have it run a number of times, e. g. generate 100 times and calculate average runtimes (and ideally even perplexity).

Then we could more easily make meaningful performance (and quality) comparisons. Running tests manually through a UI takes too long to run as often as necessary for useful results.

Another, IMHO better, option would be to give koboldcpp a text file as an input option that contains the whole Input string that's output to the console. That way we'd be able to create a sample file with settings and context as we want, and have that run a number of times and averages calculated automatically.

LostRuins · 2023-07-18T11:08:29Z

LostRuins
Jul 18, 2023
Maintainer

This shouldn't be too hard to achieve via the API, and a simple batch script could trigger generations with cURL and measure and record the timings taken. There's also a /api/extra/perf endpoint that includes timing information.

Here's a functional sample .bat file for this:

@echo off

echo Benchmarking timing...
curl -XPOST http://localhost:5001/api/v1/generate/ -N --http2 -H "Connection: keep-alive" -H "Content-Type: application/json" -H "Accept:text/event-stream" -d {\"n\":1,\"max_context_length\":2048,\"max_length\":16,\"rep_pen\":1,\"temperature\":0.7,\"top_p\":0.92,\"top_k\":0,\"top_a\":0,\"typical\":1,\"tfs\":1,\"rep_pen_range\":256,\"rep_pen_slope\":0.7,\"sampler_order\":[6,0,1,2,3,4,5],\"prompt\":\""Hi, my name is"\",\"quiet\":true}

echo ""
echo Request completed, get timing info...
curl http://localhost:5001/api/extra/perf
echo ""

0 replies

WolframRavenwolf · 2023-07-19T20:03:14Z

WolframRavenwolf
Jul 19, 2023
Author

Thank you, @LostRuins, that's very helpful! I could use your code example to run inference from the (WSL, bash) command line.

Could you expose Total T/s on the perf endpoint, too? The Processing and Generation stats are there, so only the third (and probably most important one) is missing.

1 reply

LostRuins Jul 20, 2023
Maintainer

Yeah sure, should be doable. I will expose the last token count, and you can calculate the tokens per sec from there by dividing it by the total time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Benchmarking #324

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Automated Benchmarking #324

WolframRavenwolf Jul 18, 2023

Replies: 2 comments · 1 reply

LostRuins Jul 18, 2023 Maintainer

WolframRavenwolf Jul 19, 2023 Author

LostRuins Jul 20, 2023 Maintainer

WolframRavenwolf
Jul 18, 2023

Replies: 2 comments 1 reply

LostRuins
Jul 18, 2023
Maintainer

WolframRavenwolf
Jul 19, 2023
Author

LostRuins Jul 20, 2023
Maintainer