Default TGI Inference parameter values #2978

ashwincv0112 · 2025-01-31T10:50:03Z

System Info

Hi team,

We are trying to get the default parameter values that is being used while invoking a fine-tuned model which is deployed using TGI (latest version).

In the logs we are able to get the below information.

{ best_of: None, 
temperature: None, 
repetition_penalty: None, 
frequency_penalty: None, 
top_k: None, 
top_p: None,
typical_p: None, 
do_sample: false, 
max_new_tokens: Some(672), 
return_full_text: None, 
stop: [], 
truncate: None, 
watermark: false, 
details: false, 
decoder_input_details: false, 
seed: None, 
top_n_tokens: None, 
grammar: None, 
adapter_id: None } 
total_time="10.779314795s" 
validation_time="536.816µs" 
queue_time="60.971µs"
inference_time="10.778717208s" 
time_per_token="16.039757ms" 
seed="None"}

The objective of the exercise is, we are trying to get the same level of accuracy from the model output between a Finetuned model and Base model + LoRA adapters (deployed using the multi-lora functionality of the TGI).

We are getting the expected output from the Finetuned model but when using the multi-lora the accuracy of the output reduces drastically.

We are using the below config while invoking.

While using Finetuned model

'parameters': {
            'max_new_tokens': token_limit,
        },

While using Multi-LoRA functionality

'parameters': {
            'max_new_tokens': token_limit,
            "adapter_id": "adapter1",
        },
    }

we did refer to the below link:

https://github.com/huggingface/text-generation-inference/blob/38773453ae0d29fba3dc79a38d589ebdc5451093/router/src/lib.rs

Could you suggest is there any difference between the default values used in above mentioned methodology. Also if you can suggest a way to increase the output accuracy while using Multi-LoRA.

Thanks.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Multi-LoRA deployment:

docker run -it \
  --gpus all \
  --shm-size 1g \
  -v /home/ubuntu/data:/data \
  -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
	--model-id=/data/starcoder2-3b \
	--lora-adapters=adapter1=/data/adapter1
	--dtype bfloat16

Finetuned model deployment

docker run --gpus all -d -p 8080:80 \
	-v /home/ubuntu/data_backup:/data \
	ghcr.io/huggingface/text-generation-inference:latest \
	--model-id=/data/

Expected behavior

With the same parameter values, we should be getting the same output (or at least same output accuracy)

The text was updated successfully, but these errors were encountered:

ashwincv0112 · 2025-02-05T12:47:15Z

Hi Team,

We are trying to match the output from a TGI deployed Finetuned model with a Model deployed using TGI Multi-LoRA functionality (where we are using a base model (Starcoder2-3B) and 2 different fine-tuned adapters).

Even after keeping all the inference parameters same, we are getting completely different outputs for the same prompts.

Please find the list of parameters used.

'inputs': input_prompt,
        'parameters': {
            'max_new_tokens': token_limit,
            "adapter_id": "adapter1",
            "best_of": None,
            "decoder_input_details": False,
            "details": False,
            "do_sample": False,
            "frequency_penalty": None,
            "grammar": None,
            "repetition_penalty": None, 
            "return_full_text": None,
            "seed": None,
            "temperature": None,
            "top_k": None,
            "top_n_tokens": None,
            "top_p": None,
            "truncate": None,
            "typical_p": None,
            "watermark": False
        },

Could you provide some input on this.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default TGI Inference parameter values #2978

Default TGI Inference parameter values #2978

ashwincv0112 commented Jan 31, 2025

ashwincv0112 commented Feb 5, 2025 •

edited

Loading

Default TGI Inference parameter values #2978

Default TGI Inference parameter values #2978

Comments

ashwincv0112 commented Jan 31, 2025

System Info

While using Finetuned model

While using Multi-LoRA functionality

Information

Tasks

Reproduction

Multi-LoRA deployment:

Finetuned model deployment

Expected behavior

ashwincv0112 commented Feb 5, 2025 • edited Loading

ashwincv0112 commented Feb 5, 2025 •

edited

Loading