Benchmark #51
raymond-infinitecode
started this conversation in
General
Replies: 1 comment
-
You can use this function to get the timings to compare to llama.cpp Also ensure that the library was built with the same optimizations (if not please file an issue) Finally you can try running a sampling profiler like py-spy https://github.com/benfred/py-spy, something like py-spy record --native -o profile.svg -- python examples/high_level_api/high_level_api_inference.py -m path/to/model.bin this will produce a detailed flamegraph so you can see where the program is spending most of it's time. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is it possible for anyone to provide a benchmark of the API in relation to the pure llama.cpp ?
as I can run that* .exe from llama.cpp pretty fast, but the python binding is jammed even with the simple demo
provided.
Beta Was this translation helpful? Give feedback.
All reactions