Skip to content

Releases: l3utterfly/llama.cpp

b4767

24 Feb 16:29
08d5986
Compare
Choose a tag to compare
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <[email protected]>

b4713

14 Feb 08:38
a4f011e
Compare
Choose a tag to compare
vulkan: linux builds + small subgroup size fixes (#11767)

* mm subgroup size

* upload vulkan x86 builds

b4687

11 Feb 07:22
b9ab0a4
Compare
Choose a tag to compare
CUDA: use arch list for compatibility check (#11775)

* CUDA: use arch list for feature availability check

---------

Co-authored-by: Diego Devesa <[email protected]>

b4519

21 Jan 05:58
80d0d6b
Compare
Choose a tag to compare
common : add -hfd option for the draft model (#11318)

* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes

b4393

28 Dec 08:11
d79d8f3
Compare
Choose a tag to compare
vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4302

11 Dec 07:41
43041d2
Compare
Choose a tag to compare
ggml: load all backends from a user-provided search path (#10699)

* feat: load all backends from a user-provided search path

* fix: Windows search path

* refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path`

* refactor: rename `search_path` to `dir_path`

* fix: change `NULL` to `nullptr`

Co-authored-by: Diego Devesa <[email protected]>

* fix: change `NULL` to `nullptr`

---------

Co-authored-by: Diego Devesa <[email protected]>

b4219

29 Nov 10:53
266b851
Compare
Choose a tag to compare
sycl : Reroute permuted mul_mats through oneMKL (#10408)

This PR fixes the failing MUL_MAT tests for the sycl backend.

b4200

27 Nov 12:46
46c69e0
Compare
Choose a tag to compare
ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

b4098

16 Nov 07:52
772703c
Compare
Choose a tag to compare
vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

b4033

05 Nov 08:45
a9e8a9a
Compare
Choose a tag to compare
ggml : fix arch check in bf16_to_fp32 (#10164)