Releases · chraac/llama.cpp

22 May 04:09

201cc11

b2961 Latest

Latest

llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-22T04:09:14Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-22T04:09:20Z
llama-b2961-bin-macos-arm64.zip

41.3 MB 2024-05-22T04:09:28Z
llama-b2961-bin-macos-x64.zip

37.9 MB 2024-05-22T04:09:29Z
llama-b2961-bin-ubuntu-x64.zip

45.8 MB 2024-05-22T04:09:30Z
llama-b2961-bin-win-avx-x64.zip

6.66 MB 2024-05-22T04:09:31Z
llama-b2961-bin-win-avx2-x64.zip

6.64 MB 2024-05-22T04:09:32Z
llama-b2961-bin-win-avx512-x64.zip

6.66 MB 2024-05-22T04:09:32Z
llama-b2961-bin-win-clblast-x64.zip

7.84 MB 2024-05-22T04:09:33Z
llama-b2961-bin-win-cuda-cu11.7.1-x64.zip

65 MB 2024-05-22T04:09:34Z
Source code (zip)

2024-05-21T20:28:32Z
Source code (tar.gz)

2024-05-21T20:28:32Z

21 May 15:22

github-actions

b2956

11474e7

b2956

examples: cache hf model when --model not provided (#7353)

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

Assets 21

18 May 13:23

github-actions

b2927

511182e

b2927

android : use "ci-android" branch for CI (#7341)

* android : use "ci-android" branch for CI

* ggml : disable SIMD exp and silu for 32-bit ARM

ggml-ci

* android : do not fetch, use add_subdirectory instead

* cmake : provide binary dir

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: chraac/llama.cpp

b2961

b2956

b2927