Releases: VJHack/llama.cpp
Releases · VJHack/llama.cpp
b4230
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
b4173
Introduce llama-run (#10291) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
b4008
readme : update hot topics
b3983
llama : switch KQ multiplication to F32 precision by default (#10015) ggml-ci
b3934
readme : update bindings list (#9918) Co-authored-by: Tim Wang <[email protected]>
b3786
allow disable context shift for sever
b3785
ggml : fix n_threads_cur initialization with one thread (#9538) * ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <[email protected]>
b3767
made loading message more descriptive