Skip to content

Releases: VJHack/llama.cpp

b4230

30 Nov 21:17
0c39f44
Compare
Choose a tag to compare
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…

b4173

26 Nov 00:38
0cc6375
Compare
Choose a tag to compare
Introduce llama-run (#10291)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>

b4008

01 Nov 18:28
ba6f62e
Compare
Choose a tag to compare
readme : update hot topics

b3983

27 Oct 21:07
8841ce3
Compare
Choose a tag to compare
llama : switch KQ multiplication to F32 precision by default (#10015)

ggml-ci

b3934

17 Oct 15:39
3752217
Compare
Choose a tag to compare
readme : update bindings list (#9918)

Co-authored-by: Tim Wang <[email protected]>

b3786

19 Sep 01:41
Compare
Choose a tag to compare
allow disable context shift for sever

b3785

18 Sep 23:20
64c6af3
Compare
Choose a tag to compare
ggml : fix n_threads_cur initialization with one thread (#9538)

* ggml : fix n_threads_cur initialization with one thread

* Update ggml/src/ggml.c

---------

Co-authored-by: Max Krasnyansky <[email protected]>

b3767

13 Sep 05:42
Compare
Choose a tag to compare
made loading message more descriptive