Replies: 2 comments
-
Yup, @jart did some magic there 😄 , thanks 🙌 |
Beta Was this translation helpful? Give feedback.
0 replies
-
performance of v0.8.7 seems to top all other version, wondering y? correct me if i were wrong |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have been using llamafile for a few months now, and with the release of llama 3.1, I did some performance tests.
First, the evolution over time using llama 3.0 8b instruct on my old Xeon E5-1630 v3. I used the quantized models done by Sanctum AI https://huggingface.co/SanctumAI/Meta-Llama-3-8B-Instruct-GGUF .
All the tests were performed once on a mostly idle server, with the same prompt: the summarization of an article of about 2000 words (2350 tokens).
<style> </style>You can clearly see the performance jump from llamafile 0.8.1 to 0.8.2 (except on Q8_0), then another smaller one from 0.8.4 to 0.8.5.
And here is a comparison of llama 3.0 and llama 3.1 8b instruct GGUF with llamafile 0.8.1.
<style> </style>Thanks for all your work @jart and others.
Beta Was this translation helpful? Give feedback.
All reactions