Wrong chat format for llava 1.5 #1905

BenjaminMarechalEVITECH · 2025-01-24T20:26:05Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

I'm running llama-server with following command:

python3 -m llama_cpp.server --model models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf --clip_model_path models/mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf --model_alias llava-v1.5-13b-q4_k --chat_format llava-1-5 --port 10322

(models downloaded from https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main)

When I call the server using openai python package:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:10322/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)

chat_completion = client.chat.completions.create(
    model="models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf",
    messages=[
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ],
)

The server output this in console:

A chat between a curious human and an artificial intelligence assistant.  The assistant gives helpful, detailed, and polite answers to the human's questions.USER: Write a limerick about python exceptionsUSER: ASSISTANT: 
Llama.generate: 48 prefix-match hit, remaining 1 prompt tokens to eval
llama_perf_context_print:        load time =  672844.44 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    32 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     906.07 ms /    33 tokens
INFO:     127.0.0.1:40776 - "POST /v1/chat/completions HTTP/1.1" 200 OK

As you can read, there is an additional unwanted USER: at the end of the prompt.
I guess the chat format provided by --chat_format llava-1-5 is not correct.

Environment and Context

llama_cpp installed with pip install llama-cpp-python[server]
print(llama_cpp.__version__): 0.3.6
print(openai.__version__): 1.59.7

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong chat format for llava 1.5 #1905

Wrong chat format for llava 1.5 #1905

BenjaminMarechalEVITECH commented Jan 24, 2025

Wrong chat format for llava 1.5 #1905

Wrong chat format for llava 1.5 #1905

Comments

BenjaminMarechalEVITECH commented Jan 24, 2025

Prerequisites

Current Behavior

Environment and Context