Benchmark #51

raymond-infinitecode · 2023-04-09T03:05:47Z

raymond-infinitecode
Apr 9, 2023

Is it possible for anyone to provide a benchmark of the API in relation to the pure llama.cpp ?
as I can run that* .exe from llama.cpp pretty fast, but the python binding is jammed even with the simple demo
provided.

abetlen · 2023-04-09T04:16:49Z

abetlen
Apr 9, 2023
Maintainer

You can use this function to get the timings to compare to llama.cpp
https://abetlen.github.io/llama-cpp-python/#llama_cpp.llama_cpp.llama_print_timings

Also ensure that the library was built with the same optimizations (if not please file an issue)

Finally you can try running a sampling profiler like py-spy https://github.com/benfred/py-spy, something like

py-spy record --native -o profile.svg -- python examples/high_level_api/high_level_api_inference.py -m path/to/model.bin

this will produce a detailed flamegraph so you can see where the program is spending most of it's time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark #51

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Benchmark #51

raymond-infinitecode Apr 9, 2023

Replies: 1 comment

abetlen Apr 9, 2023 Maintainer

raymond-infinitecode
Apr 9, 2023

abetlen
Apr 9, 2023
Maintainer