Releases · VJHack/llama.cpp

30 Nov 21:17

0c39f44

b4230

ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…

Assets 22

26 Nov 00:38

github-actions

b4173

0cc6375

b4173

Introduce llama-run (#10291)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>

Assets 21

01 Nov 18:28

github-actions

b4008

ba6f62e

b4008

readme : update hot topics

Assets 22

27 Oct 21:07

github-actions

b3983

8841ce3

b3983

llama : switch KQ multiplication to F32 precision by default (#10015)

ggml-ci

Assets 22

17 Oct 15:39

github-actions

b3934

3752217

b3934

readme : update bindings list (#9918)

Co-authored-by: Tim Wang <[email protected]>

Assets 22

19 Sep 01:41

github-actions

b3786

5688864

b3786

allow disable context shift for sever

Assets 19

18 Sep 23:20

github-actions

b3785

64c6af3

b3785

ggml : fix n_threads_cur initialization with one thread (#9538)

* ggml : fix n_threads_cur initialization with one thread

* Update ggml/src/ggml.c

---------

Co-authored-by: Max Krasnyansky <[email protected]>

Assets 19

13 Sep 05:42

github-actions

b3767

739ea75

b3767

made loading message more descriptive

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: VJHack/llama.cpp

b4230

b4173

b4008

b3983

b3934

b3786

b3785

b3767