Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

white-wolf-tech · 2025-02-27T06:10:06Z

When I use the Qwen2.5-3B model and perform quantization using the Int8_sq algorithm.
The checkpoint_convert.py script that comes with the TensorrtLLM library (that is, the Int8_sq algorithm implemented by them, without using the Modelopt library), the compiled engine can be used normally by the tritonserver tensorrtllm-backend.

However, when using the Modelopt library and the same algorithm, the compiled engine cannot be used normally by the tritonserver tensorrtllm-backend. Is this because the current version does not support this model? Or what other problems could there be?

kevalmorabia97 · 2025-02-27T06:12:30Z

What error do you see when using ModelOpt's quantized checkpoint with tritonserver?
Note that TensorRT-LLM under the hood also uses ModelOpt library for quantization

white-wolf-tech · 2025-02-27T06:25:03Z

The detailed situation is here.
NVIDIA/TensorRT-LLM#2810

The phenomenon is that, with the same algorithm, when using the conversion script that comes with TensorrtLLM, the output result is normal. However, after compiling with ModelOpt, all the output tokens are 1023, and the structure after decoding is:

"xx.Componentlocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklock"

cjluo-nv · 2025-02-27T19:28:40Z

Also have you tried the llm_ptq examples in this repo as well?

white-wolf-tech · 2025-02-28T05:28:53Z

Also have you tried the llm_ptq examples in this repo as well?

the result is the same

white-wolf-tech · 2025-03-03T06:05:28Z

Are the quantization features related to FP8 only available under the Hopper architecture, such as when using the H100 or H200? For example, for GPUs like the L20 and L40s, which are based on the Ada architecture, are they not supported?
@cjluo-nv @kevalmorabia97

kevalmorabia97 · 2025-03-03T06:17:10Z

Yes Ada architecture supports FP8

white-wolf-tech · 2025-03-05T05:25:21Z

There might have been problems with the installation of the dependent libraries before. I reinstalled tensorrt-llm and modelopt. The quantization results of modelopt should be correct. The main issue lies with the FP8 operators. To address this, I conducted the following experiments:

Use w4a8_awq quantization, turn off kvcache fp8 quantization, and turn off use_fp8_context_fmha. The output of the model is normal.
Turn on kvcache fp8 quantization, turn off use_fp8_context_fmha, and the output of the compiled model is garbled.
Use w4a8_awq, turn on use_fp8_context_fmha, turn off kvcache fp8 quantization, and when compiling the model, an error is reported: "[TensorRT-LLM][ERROR] TllmXqaJit runtime error in tllmXqaJitCreateAndCompileProgram(&program, &context): NVRTC Internal Error ".
With w4a8_awq quantization, turn on kvcache fp8 quantization and turn on use_fp8_context_fmha. The model can be successfully quantized, but the output is garbled.

Judging from these experiments above, there should be a problem with the tensorrt-llm library when handling FP8-related operators. Maybe these operations are not supported on the L20 currently? Perhaps I will conduct experiments on the H series later.

kevalmorabia97 assigned cjluo-nv Feb 27, 2025

white-wolf-tech closed this as completed Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

white-wolf-tech commented Feb 27, 2025

kevalmorabia97 commented Feb 27, 2025

white-wolf-tech commented Feb 27, 2025

cjluo-nv commented Feb 27, 2025 •

edited

Loading

white-wolf-tech commented Feb 28, 2025

white-wolf-tech commented Mar 3, 2025

kevalmorabia97 commented Mar 3, 2025

white-wolf-tech commented Mar 5, 2025

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

Comments

white-wolf-tech commented Feb 27, 2025

kevalmorabia97 commented Feb 27, 2025

white-wolf-tech commented Feb 27, 2025

cjluo-nv commented Feb 27, 2025 • edited Loading

white-wolf-tech commented Feb 28, 2025

white-wolf-tech commented Mar 3, 2025

kevalmorabia97 commented Mar 3, 2025

white-wolf-tech commented Mar 5, 2025

cjluo-nv commented Feb 27, 2025 •

edited

Loading