From 010f4d282e86babe216af6e037ab10bf078415e7 Mon Sep 17 00:00:00 2001 From: Yingbei Date: Tue, 18 Jun 2024 12:39:17 -0700 Subject: [PATCH] update readme --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7150d6f00b158..b74fca8d2b9e2 100644 --- a/README.md +++ b/README.md @@ -23,11 +23,13 @@ For example: wget https://huggingface.co/sanjay920/Llama-3-8b-function-calling-alpha-v1.gguf/resolve/main/Llama-3-8b-function-calling-alpha-v1.gguf ``` -4. start server: +4. start openai compatible server: ``` ./llama-server -ngl 35 -m Llama-3-8b-function-calling-alpha-v1.gguf --port 1234 --host 0.0.0.0 -c 16000 --chat-template llama3 ``` +5. That's it! Make sure you turn `stream` off when making api calls to the server, as streaming feature is not supported yet. + ### Recent API changes - [2024 Apr 21] `llama_token_to_piece` can now optionally render special tokens https://github.com/ggerganov/llama.cpp/pull/6807