Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from upstream #56

Merged
merged 28 commits into from
Mar 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7a2c913
llava : Add Granite Vision Support (#11794)
alex-jw-brooks Feb 24, 2025
34a846b
opencl: fix for small models (#11950)
lhez Feb 24, 2025
58d07a8
metal : copy kernels for quant to F32/F16 conversions (#12017)
gcp Feb 25, 2025
3e9a286
llama : expose llama_model_n_head_kv in the API (#11997)
vlovich Feb 25, 2025
4d1051a
Add Doc for Converting Granite Vision -> GGUF (#12006)
alex-jw-brooks Feb 25, 2025
0b52745
server: support add_generation_prompt query param (#12062)
ochafik Feb 25, 2025
61d4f39
vulkan: implement more backpropagation operators (#11914)
remyoudompheng Feb 25, 2025
393fca6
ggml-cpu: Fix build with sve (#12059)
MollySophia Feb 25, 2025
c132239
add OP sigmoid (#12056)
foldl Feb 25, 2025
401af80
server: handle echo=false on /v1/completions (#12060)
rhjdvsgsgks Feb 25, 2025
a82c9e7
vulkan: fix assertion when qy_needs_dequant (#12068)
jeffbolznv Feb 25, 2025
d7cfe1f
docs: add docs/function-calling.md to lighten server/README.md's plig…
ochafik Feb 25, 2025
53e4db1
readme : update infra list (#9096)
kerthcet Feb 26, 2025
3567ee3
gguf-py: enable reading non-native endian files (#12081)
AlekseiNikiforovIBM Feb 26, 2025
69050a1
Refactor gguf scripts to improve metadata handling (#11909)
CISC Feb 26, 2025
a800ae4
llava : add struct for FFI bindgen (#12079)
tinglou Feb 26, 2025
b95c8af
cmake: Fix ggml backend dependencies and installation (#11818)
vvuksanovic Feb 27, 2025
581650b
vulkan: improve im2col (#11826)
daniandtheweb Feb 28, 2025
fbeda90
vulkan: matmul dequantization improvements (#12015)
netrunnereve Feb 28, 2025
673cfef
CANN: Fix build error with GCC 13 (#11990)
hipudding Feb 28, 2025
05e6f5a
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064)
Vithulep Feb 28, 2025
9c42b17
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098)
JohannesGaessler Feb 28, 2025
438a839
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizatio…
remyoudompheng Feb 28, 2025
84d5f4b
Update granite vision docs for 3.2 model (#12105)
alex-jw-brooks Feb 28, 2025
c43a3e7
llama : add Phi-4-mini support (supersede #12099) (#12108)
ngxson Feb 28, 2025
70680c4
ggml : upgrade init_tensor API to return a ggml_status (#11854)
WilliamTambellini Feb 28, 2025
06c2b15
convert : fix Norway problem when parsing YAML (#12114)
ngxson Feb 28, 2025
648f244
Merge branch 'layla-build' into merge
l3utterfly Mar 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ lcov-report/
tags
.build/
build*
release
debug
!build-info.cmake
!build-info.cpp.in
!build-info.sh
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@

_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_

- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggml-org/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister) - llama.cpp as a smart contract on the Internet Computer, using WebAssembly
- [llama-swap](https://github.com/mostlygeek/llama-swap) - transparent proxy that adds automatic model switching with llama-server
- [Kalavai](https://github.com/kalavai-net/kalavai-client) - Crowdsource end to end LLM deployment at any scale

- [llmaz](https://github.com/InftyAI/llmaz) - ☸️ Easy, advanced inference platform for large language models on Kubernetes.
</details>

<details>
Expand Down
11 changes: 8 additions & 3 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -699,6 +699,9 @@ def get_vocab_base_pre(self, tokenizer) -> str:
if chkhsh == "b3f499bb4255f8ca19fccd664443283318f2fd2414d5e0b040fbdd0cc195d6c5":
# ref: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
res = "deepseek-r1-qwen"
if chkhsh == "ccc2ef013c104be7bae2965776d611e1d7a8a2a9c547dd93a682c9a9fc80352e":
# ref: https://huggingface.co/Xenova/gpt-4o
res = "gpt-4o"

if res is None:
logger.warning("\n")
Expand Down Expand Up @@ -2512,7 +2515,8 @@ def set_gguf_parameters(self):
rms_eps = self.find_hparam(["rms_norm_eps"])
max_pos_embds = self.find_hparam(["n_positions", "max_position_embeddings"])
orig_max_pos_embds = self.find_hparam(["original_max_position_embeddings"])
rope_dims = n_embd // n_head
rot_pct = self.hparams.get("partial_rotary_factor", 1.0)
rope_dims = int(rot_pct * n_embd) // n_head

self.gguf_writer.add_context_length(max_pos_embds)
self.gguf_writer.add_rope_scaling_orig_ctx_len(orig_max_pos_embds)
Expand All @@ -2536,7 +2540,8 @@ def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
n_head = self.find_hparam(["num_attention_heads", "n_head"])
max_pos_embds = self.find_hparam(["n_positions", "max_position_embeddings"])
orig_max_pos_embds = self.find_hparam(["original_max_position_embeddings"])
rope_dims = n_embd // n_head
rot_pct = self.hparams.get("partial_rotary_factor", 1.0)
rope_dims = int(rot_pct * n_embd) // n_head

# write rope scaling for long context (128k) model
rope_scaling = self.find_hparam(['rope_scaling'], True)
Expand Down Expand Up @@ -2565,7 +2570,7 @@ def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
raise KeyError('Missing the required key rope_scaling.long_factor or rope_scaling_short_factor')

if len(long_factors) != len(short_factors) or len(long_factors) != rope_dims / 2:
raise ValueError(f'The length of rope long and short factors must be {rope_dims / 2}')
raise ValueError(f'The length of rope long and short factors must be {rope_dims / 2}. long_factors = {len(long_factors)}, short_factors = {len(short_factors)}.')

yield (self.format_tensor_name(gguf.MODEL_TENSOR.ROPE_FACTORS_LONG), torch.tensor(long_factors, dtype=torch.float32))
yield (self.format_tensor_name(gguf.MODEL_TENSOR.ROPE_FACTORS_SHORT), torch.tensor(short_factors, dtype=torch.float32))
Expand Down
5 changes: 5 additions & 0 deletions convert_hf_to_gguf_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ class TOKENIZER_TYPE(IntEnum):
{"name": "megrez", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Infinigence/Megrez-3B-Instruct"},
{"name": "deepseek-v3", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-V3"},
{"name": "deepseek-r1-qwen", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"},
{"name": "gpt-4o", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Xenova/gpt-4o", },
]


Expand All @@ -131,6 +132,10 @@ def download_model(model):

files = ["config.json", "tokenizer.json", "tokenizer_config.json"]

if name == "gpt-4o":
# Xenova/gpt-4o is tokenizer-only, it does not contain config.json
files = ["tokenizer.json", "tokenizer_config.json"]

if tokt == TOKENIZER_TYPE.SPM:
files.append("tokenizer.model")

Expand Down
Loading
Loading