Releases · l3utterfly/llama.cpp

24 Feb 16:29

08d5986

b4767 Latest

Latest

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <[email protected]>

Assets 25

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-24T16:29:40Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-24T16:29:48Z
llama-b4767-bin-macos-arm64.zip

23.3 MB 2025-02-24T16:29:59Z
llama-b4767-bin-macos-x64.zip

24.9 MB 2025-02-24T16:30:00Z
llama-b4767-bin-ubuntu-arm64.zip

25.4 MB 2025-02-24T16:30:01Z
llama-b4767-bin-ubuntu-vulkan-x64.zip

30.7 MB 2025-02-24T16:30:03Z
llama-b4767-bin-ubuntu-x64.zip

26.9 MB 2025-02-24T16:30:04Z
llama-b4767-bin-win-avx-x64.zip

16.4 MB 2025-02-24T16:30:05Z
llama-b4767-bin-win-avx2-x64.zip

16.4 MB 2025-02-24T16:30:06Z
llama-b4767-bin-win-avx512-x64.zip

16.4 MB 2025-02-24T16:30:07Z
Source code (zip)

2025-02-24T14:33:23Z
Source code (tar.gz)

2025-02-24T14:33:23Z

14 Feb 08:38

github-actions

b4713

a4f011e

b4713

vulkan: linux builds + small subgroup size fixes (#11767)

* mm subgroup size

* upload vulkan x86 builds

Assets 24

11 Feb 07:22

github-actions

b4687

b9ab0a4

b4687

CUDA: use arch list for compatibility check (#11775)

* CUDA: use arch list for feature availability check

---------

Co-authored-by: Diego Devesa <[email protected]>

Assets 23

21 Jan 05:58

github-actions

b4519

80d0d6b

b4519

common : add -hfd option for the draft model (#11318)

* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes

Assets 23

28 Dec 08:11

github-actions

b4393

d79d8f3

b4393

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

11 Dec 07:41

github-actions

b4302

43041d2

b4302

ggml: load all backends from a user-provided search path (#10699)

* feat: load all backends from a user-provided search path

* fix: Windows search path

* refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path`

* refactor: rename `search_path` to `dir_path`

* fix: change `NULL` to `nullptr`

Co-authored-by: Diego Devesa <[email protected]>

* fix: change `NULL` to `nullptr`

---------

Co-authored-by: Diego Devesa <[email protected]>

Assets 22

29 Nov 10:53

github-actions

b4219

266b851

b4219

sycl : Reroute permuted mul_mats through oneMKL (#10408)

This PR fixes the failing MUL_MAT tests for the sycl backend.

Assets 22

27 Nov 12:46

github-actions

b4200

46c69e0

b4200

ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

Assets 22

16 Nov 07:52

github-actions

b4098

772703c

b4098

vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

Assets 21

05 Nov 08:45

github-actions

b4033

a9e8a9a

b4033

ggml : fix arch check in bf16_to_fp32 (#10164)

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: l3utterfly/llama.cpp

b4767

b4713

b4687

b4519

b4393

b4302

b4219

b4200

b4098

b4033