InternVL2.5-78b model's performance significantly degrades after AWQ quantization #876

vladimiralbrekhtccr · 2025-01-22T09:35:02Z

Hello.

I have fine-tuned the InternVL2.5-78B model and quantized it using the AWQ method with lmdeploy:

export CUDA_VISIBLE_DEVICES=1

export HF_MODEL="model"
export WORK_DIR="output_dir"

lmdeploy lite auto_awq \
   $HF_MODEL \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 2048 \
  --w-bits 4 \
  --w-group-size 128 \
  --batch-size 1 \
  --work-dir $WORK_DIR

The fine-tuned model's performance on the VLM benchmarks are on par with the original model. However, the performance of the quantized model has decreased significantly.

Am I using the correct quantization method?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InternVL2.5-78b model's performance significantly degrades after AWQ quantization #876

InternVL2.5-78b model's performance significantly degrades after AWQ quantization #876

vladimiralbrekhtccr commented Jan 22, 2025

InternVL2.5-78b model's performance significantly degrades after AWQ quantization #876

InternVL2.5-78b model's performance significantly degrades after AWQ quantization #876

Comments

vladimiralbrekhtccr commented Jan 22, 2025