llamafile performance on Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz #507

ftoppi · 2024-07-25T19:08:27Z

ftoppi
Jul 25, 2024

Hello,
I have been using llamafile for a few months now, and with the release of llama 3.1, I did some performance tests.

First, the evolution over time using llama 3.0 8b instruct on my old Xeon E5-1630 v3. I used the quantized models done by Sanctum AI https://huggingface.co/SanctumAI/Meta-Llama-3-8B-Instruct-GGUF .

All the tests were performed once on a mostly idle server, with the same prompt: the summarization of an article of about 2000 words (2350 tokens).

version	Q4_K_M prompt eval tokens/s	Q4_K_M eval tokens/s	Q5_K_M prompt eval tokens/s	Q5_K_M eval tokens/s	Q6_K prompt eval tokens/s	Q6_K eval tokens/s	Q8_0 prompt eval tokens/s	Q8_0 eval tokens/s
0.6.2	8,48	4,87	6,48	4,75	7	4,56	6,04	3,49
0.7	7,57	4,89	6,83	4,85	7,03	4,58	10,97	3,66
0.8.0	7,57	5,33	7,61	4,72	7,39	4,34	11,87	3,47
0.8.1	9,25	5,28	7,21	4,31	7,26	4,35	11,57	3,53
0.8.2	15,13	5,29	14,16	4,71	15,53	4,38	12,93	2,83
0.8.4	15,24	5,29	14,18	4,78	15,54	4,37	13,53	3,42
0.8.5	17,62	5,98	16,86	5,31	17,39	4,85	13,54	3,81
0.8.6	17,56	6	16,89	5,33	17,39	4,85	13,61	3,84
0.8.7	17,59	6,02	17,04	5,41	17,76	4,84	13,58	3,81
0.8.8	17,58	5,96	17,04	5,33	17,8	4,85	13,54	3,81
0.8.9	17,54	5,98	17,04	5,32	17,75	4,85	13,56	3,83
0.8.10	16,77	6,01	16,93	5,32	17,6	4,82	13,43	3,81
0.8.11	16,85	6,01	16,9	5,25	17,72	4,84	13,48	3,82

You can clearly see the performance jump from llamafile 0.8.1 to 0.8.2 (except on Q8_0), then another smaller one from 0.8.4 to 0.8.5.

And here is a comparison of llama 3.0 and llama 3.1 8b instruct GGUF with llamafile 0.8.1.

	llama 3.0 PE t/s	llama 3.0 E tk/s	llama 3.1 PE t/s	llama 3.1 E tk/s
Q4_K_M	16,85	6,01	16,73	6
Q5_K_M	16,9	5,25	16,88	5,33
Q6_K	17,72	4,84	17,73	4,88
Q8_0	13,48	3,82	13,46	3,87

Thanks for all your work @jart and others.

lordrip · 2024-07-26T15:47:39Z

Yup, @jart did some magic there 😄 , thanks 🙌

0 replies

xdevfaheem · 2024-08-06T17:52:47Z

performance of v0.8.7 seems to top all other version, wondering y? correct me if i were wrong

0 replies