merge from upstream #51

l3utterfly · 2025-01-21T05:12:54Z

Make sure to read the contributing guidelines before submitting a PR

This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <[email protected]>

* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>

* tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT

Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.

* llama : add support for Deepseek-R1-Qwen distill model * coding style

ggml-ci Co-authored-by: Xuan Son Nguyen <[email protected]>

mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.

* common : add -hfd option for the draft model * cont : fix env var * cont : more fixes

ericcurtin and others added 13 commits January 18, 2025 14:42

simple-chat : fix BOS being added to each message (ggml-org#11278)

b9daaff

tests : increase timeout when sanitizers are enabled (ggml-org#11300)

92bc493

* tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT

llama : re-add LLM_ARCH_PHIMOE (ggml-org#11305)

ae3c1db

Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.

cont : fix whitespaces (ggml-org#11305)

ef6dada

llama : add support for Deepseek-R1-Qwen distill model (ggml-org#11310)

ec7f3ac

* llama : add support for Deepseek-R1-Qwen distill model * coding style

cmake: fix shell command quoting in build-info script (ggml-org#11309)

a4251ed

mmap: add include for cerrno (ggml-org#11296)

90d987b

ggml-ci Co-authored-by: Xuan Son Nguyen <[email protected]>

examples : fix add_special conditions (ggml-org#11311)

9f7add1

common : add -hfd option for the draft model (ggml-org#11318)

80d0d6b

* common : add -hfd option for the draft model * cont : fix env var * cont : more fixes

Merge branch 'layla-build' into merge

a6ab3d4

l3utterfly merged commit 5955894 into layla-build Jan 21, 2025
4 of 11 checks passed

l3utterfly deleted the merge branch January 21, 2025 05:13

github-actions bot added SYCL Vulkan build examples devops python server ggml labels Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #51

merge from upstream #51

l3utterfly commented Jan 21, 2025

merge from upstream #51

merge from upstream #51

Conversation

l3utterfly commented Jan 21, 2025