Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #53

Merged
merged 26 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a18f481
server : use common_token_to_piece instead of common_detokenize (#11740)
danbev Feb 11, 2025
90e4dba
Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx…
sheldonrobinson Feb 11, 2025
4078c77
docs: add OpenCL (#11697)
lhez Feb 11, 2025
369be55
llama : fix typo in llama-grammar.h [no ci] (#11816)
danbev Feb 12, 2025
c3d6af7
CUDA: fix CUDART_VERSION checks (#11821)
JohannesGaessler Feb 12, 2025
198b1ec
ggml-cpu: Fix duplicate MATMUL_INT8 (#11817)
ownia Feb 12, 2025
748ee9f
ggml : fix multi-threaded clamp_f32 (#11824)
Burton2000 Feb 12, 2025
fef0cbe
cleanup: fix compile warnings associated with gnu_printf (#11811)
bandoti Feb 12, 2025
e598697
HIP: Switch to std::vector in rocblas version check (#11820)
IMbackK Feb 12, 2025
0fb77f8
sync : ggml
ggerganov Feb 12, 2025
bfd11a2
Fix: Compile failure due to Microsoft STL breaking change (#11836)
MrSMlT Feb 12, 2025
5c4284d
HIP: Remove GCN from list of devices that avoid MMQ (#11831)
IMbackK Feb 12, 2025
31afcbe
server : (webui) Give copy button back to all message bubbles (#11814)
woof-dog Feb 12, 2025
be3bbd6
ggml : x2 speed for WASM by optimizing SIMD (#11453)
ngxson Feb 12, 2025
a394039
ggml-cpu : add chunking support to mul_mat_id (#11666)
slaren Feb 13, 2025
3e69319
llama : update llama_decode_internal ref [no ci] (#11840)
danbev Feb 13, 2025
e437627
llama.cpp: fix warning message (#11839)
okuvshynov Feb 13, 2025
27e8a23
sampling: add Top-nσ sampler (#11223)
VJHack Feb 13, 2025
c7f460a
`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content…
ochafik Feb 13, 2025
bd6e55b
musa: bump MUSA SDK version to rc3.1.1 (#11822)
yeahdongcn Feb 13, 2025
c48f630
llama : add --completion-bash option (#11846)
danbev Feb 13, 2025
c1f958c
server : (docs) Update wrong tool calling example (#11809)
RezaRahemtola Feb 13, 2025
8a8c4ce
llamafile: use member variable instead of constant for iq4nlt (#11780)
jmorganca Feb 13, 2025
04045bb
readme : minor
ggerganov Feb 13, 2025
a7b8ce2
llama-bench : fix unexpected global variable initialize sequence issu…
theraininsky Feb 14, 2025
a4f011e
vulkan: linux builds + small subgroup size fixes (#11767)
netrunnereve Feb 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.0
ARG MUSA_VERSION=rc3.1.1
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

Expand Down
30 changes: 29 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,34 @@ jobs:
# This is using llvmpipe and runs slower than other backends
ctest -L main --verbose --timeout 1800

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi

- name: Pack artifacts
id: pack_artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
run: |
cp LICENSE ./build/bin/
cp examples/run/linenoise.cpp/LICENSE ./build/bin/LICENSE.linenoise.cpp
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.zip ./build/bin/*

- name: Upload artifacts
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
uses: actions/upload-artifact@v4
with:
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.zip
name: llama-bin-ubuntu-vulkan-x64.zip

ubuntu-22-cmake-hip:
runs-on: ubuntu-22.04
container: rocm/dev-ubuntu-22.04:6.0.2
Expand Down Expand Up @@ -443,7 +471,7 @@ jobs:

ubuntu-22-cmake-musa:
runs-on: ubuntu-22.04
container: mthreads/musa:rc3.1.0-devel-ubuntu22.04
container: mthreads/musa:rc3.1.1-devel-ubuntu22.04

steps:
- name: Clone
Expand Down
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
| [HIP](docs/build.md#hip) | AMD GPU |
| [Vulkan](docs/build.md#vulkan) | GPU |
| [CANN](docs/build.md#cann) | Ascend NPU |
| [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |

## Building the project

Expand Down Expand Up @@ -518,5 +519,18 @@ If your issue is with model generation quality, then please at least scan the fo
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

#### References

## Completions
Command-line completion is available for some environments.

#### Bash Completion
```bash
$ build/bin/llama-cli --completion-bash > ~/.llama-completion.bash
$ source ~/.llama-completion.bash
```
Optionally this can be added to your `.bashrc` or `.bash_profile` to load it
automatically. For example:
```console
$ echo "source ~/.llama-completion.bash" >> ~/.bashrc
```

## References
131 changes: 131 additions & 0 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,108 @@ static void common_params_print_usage(common_params_context & ctx_arg) {
print_options(specific_options);
}

static void common_params_print_completion(common_params_context & ctx_arg) {
std::vector<common_arg *> common_options;
std::vector<common_arg *> sparam_options;
std::vector<common_arg *> specific_options;

for (auto & opt : ctx_arg.options) {
if (opt.is_sparam) {
sparam_options.push_back(&opt);
} else if (opt.in_example(ctx_arg.ex)) {
specific_options.push_back(&opt);
} else {
common_options.push_back(&opt);
}
}

printf("_llama_completions() {\n");
printf(" local cur prev opts\n");
printf(" COMPREPLY=()\n");
printf(" cur=\"${COMP_WORDS[COMP_CWORD]}\"\n");
printf(" prev=\"${COMP_WORDS[COMP_CWORD-1]}\"\n\n");

printf(" opts=\"");
auto print_options = [](const std::vector<common_arg *> & options) {
for (const common_arg * opt : options) {
for (const char * arg : opt->args) {
printf("%s ", arg);
}
}
};

print_options(common_options);
print_options(sparam_options);
print_options(specific_options);
printf("\"\n\n");

printf(" case \"$prev\" in\n");
printf(" --model)\n");
printf(" COMPREPLY=( $(compgen -f -X '!*.gguf' -- \"$cur\") $(compgen -d -- \"$cur\") )\n");
printf(" return 0\n");
printf(" ;;\n");
printf(" --grammar-file)\n");
printf(" COMPREPLY=( $(compgen -f -X '!*.gbnf' -- \"$cur\") $(compgen -d -- \"$cur\") )\n");
printf(" return 0\n");
printf(" ;;\n");
printf(" *)\n");
printf(" COMPREPLY=( $(compgen -W \"${opts}\" -- \"$cur\") )\n");
printf(" return 0\n");
printf(" ;;\n");
printf(" esac\n");
printf("}\n\n");

std::set<std::string> executables = {
"llama-batched",
"llama-batched-bench",
"llama-bench",
"llama-cli",
"llama-convert-llama2c-to-ggml",
"llama-cvector-generator",
"llama-embedding",
"llama-eval-callback",
"llama-export-lora",
"llama-gbnf-validator",
"llama-gen-docs",
"llama-gguf",
"llama-gguf-hash",
"llama-gguf-split",
"llama-gritlm",
"llama-imatrix",
"llama-infill",
"llama-llava-cli",
"llama-llava-clip-quantize-cli",
"llama-lookahead",
"llama-lookup",
"llama-lookup-create",
"llama-lookup-merge",
"llama-lookup-stats",
"llama-minicpmv-cli",
"llama-parallel",
"llama-passkey",
"llama-perplexity",
"llama-q8dot",
"llama-quantize",
"llama-quantize-stats",
"llama-qwen2vl-cli",
"llama-retrieval",
"llama-run",
"llama-save-load-state",
"llama-server",
"llama-simple",
"llama-simple-chat",
"llama-speculative",
"llama-speculative-simple",
"llama-tokenize",
"llama-tts",
"llama-vdot"
};

for (const auto& exe : executables) {
printf("complete -F _llama_completions %s\n", exe.c_str());
}
}

static std::vector<ggml_backend_dev_t> parse_device_list(const std::string & value) {
std::vector<ggml_backend_dev_t> devices;
auto dev_names = string_split<std::string>(value, ',');
Expand Down Expand Up @@ -426,6 +528,10 @@ bool common_params_parse(int argc, char ** argv, common_params & params, llama_e
}
exit(0);
}
if (ctx_arg.params.completion) {
common_params_print_completion(ctx_arg);
exit(0);
}
} catch (const std::invalid_argument & ex) {
fprintf(stderr, "%s\n", ex.what());
ctx_arg.params = params_org;
Expand Down Expand Up @@ -494,6 +600,13 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
exit(0);
}
));
add_opt(common_arg(
{"--completion-bash"},
"print source-able bash completion script for llama.cpp",
[](common_params & params) {
params.completion = true;
}
));
add_opt(common_arg(
{"--verbose-prompt"},
string_format("print a verbose prompt before generation (default: %s)", params.verbose_prompt ? "true" : "false"),
Expand Down Expand Up @@ -946,6 +1059,13 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
params.sampling.min_p = std::stof(value);
}
).set_sparam());
add_opt(common_arg(
{"--top-nsigma"}, "N",
string_format("top-n-sigma sampling (default: %.1f, -1.0 = disabled)", params.sampling.top_n_sigma),
[](common_params & params, const std::string & value) {
params.sampling.top_n_sigma = std::stof(value);
}
).set_examples({LLAMA_EXAMPLE_MAIN}).set_sparam());
add_opt(common_arg(
{"--xtc-probability"}, "N",
string_format("xtc probability (default: %.1f, 0.0 = disabled)", (double)params.sampling.xtc_probability),
Expand Down Expand Up @@ -1975,6 +2095,17 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
params.use_jinja = true;
}
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_MAIN}).set_env("LLAMA_ARG_JINJA"));
add_opt(common_arg(
{"--reasoning-format"}, "FORMAT",
"reasoning format (default: deepseek; allowed values: deepseek, none)\n"
"controls whether thought tags are extracted from the response, and in which format they're returned. 'none' leaves thoughts unparsed in `message.content`, 'deepseek' puts them in `message.reasoning_content` (for DeepSeek R1 & Command R7B only).\n"
"only supported for non-streamed responses",
[](common_params & params, const std::string & value) {
/**/ if (value == "deepseek") { params.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; }
else if (value == "none") { params.reasoning_format = COMMON_REASONING_FORMAT_NONE; }
else { std::invalid_argument("invalid value"); }
}
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_MAIN}).set_env("LLAMA_ARG_THINK"));
add_opt(common_arg(
{"--chat-template"}, "JINJA_TEMPLATE",
string_format(
Expand Down
Loading
Loading