Is this the correct LLM inference setup #1491

taoofstefan · 2024-05-28T14:58:56Z

taoofstefan
May 28, 2024

I am still fairly new to using LLMs locally and I would be grateful if you could let me know if I am doing it correctly.

I am using Llama 2 and Mistral locally for inference for a specific function call use case (get_current_weather). This is part of my thesis where I test small base models function call performance vs finetuned vs baseline models (GPT 3.5-Turbo and GPT 4).

The LLM is supposed to infer from the question asked if it should use the get_current_weather function or not.

Below is how I am running inference:

llm = Llama(model_path="llama-2-7b.Q5_K_M.gguf", chat_format="llama-2", max_tokens=256, seed=42, n_gpu_layers=-1, # Offload all layers to GPU main_gpu=0 # Specify the main GPU to use )

tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location.", }, }, "required": ["location", "format"], }, } } ]

system_message = "You are a helpful assistant with access to functions. Use the provided function to answer current weather questions."

for i in range(0, len(df)):
    # Retrieve the current question from the DataFrame
    question = df["Question"][i]
    
    # Define the message structure for the LLM inference
    message = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": question}
    ]
    
    # Run the LLM inference
    result = llm.create_chat_completion(
        messages=message,
        tools=tools[0]['function'],
        tool_choice="auto",    
    )
    
    # Store the assistant's response content, cleaned
    assistant_response = result["choices"][0]["message"]["content"]
        
    # Store in respective columns
    df.at[i, f"Result_{j}"] = assistant_response

Unsurprisingly, the small base models don't manage to perform the function call. I just want to make sure it's truly caused by bad performance and not my me using the tool wrong.

With a similar OpenAI API setup I managed to get decent and good results.

Thanks for your input :)

Answered by CISC

May 29, 2024

You won't get any function call responses from this model, mainly because it does not have a function calling capable chat template (and you're using chat_format="llama-2" anyway, which will ignore the chat template), if you want to use the tools parameter you have to use a model with the correct chat template (like this one) or a function calling chat format that the model supports.

Additionally you are doing a couple of things wrong:

it should be tools=tools, you are passing a single function instead, that won't work
assistant_response is set wrong it should be set to result["choices"][0]["text"]
tool_choice="auto" actually does nothing with regular chat templates (which might be what …

View full answer

CISC · 2024-05-29T09:39:25Z

CISC
May 29, 2024

You won't get any function call responses from this model, mainly because it does not have a function calling capable chat template (and you're using chat_format="llama-2" anyway, which will ignore the chat template), if you want to use the tools parameter you have to use a model with the correct chat template (like this one) or a function calling chat format that the model supports.

Additionally you are doing a couple of things wrong:

it should be tools=tools, you are passing a single function instead, that won't work
assistant_response is set wrong it should be set to result["choices"][0]["text"]
tool_choice="auto" actually does nothing with regular chat templates (which might be what you want here depending on your use case), however if you want to force a function call response you can set it to (NOTE: you will then get result["choices"][0]["tool_calls"], adjust assistant_response accordingly):

{
    "type": "function",
    "function": {
        "name": "get_current_weather"
    }
}

1 reply

taoofstefan May 30, 2024
Author

Thank you for the detailed answer. I will implement the changes accordingly. I am not expecting good results at all. I am writing on my thesis and part of the thesis is to show the function calling improvement of small LLMs with finetuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this the correct LLM inference setup #1491

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is this the correct LLM inference setup #1491

taoofstefan May 28, 2024

Replies: 1 comment · 1 reply

CISC May 29, 2024

taoofstefan May 30, 2024 Author

taoofstefan
May 28, 2024

Replies: 1 comment 1 reply

CISC
May 29, 2024

taoofstefan May 30, 2024
Author