QVQ Prefill stage slow #2961

WuNein · 2025-01-18T08:03:33Z

Using int4 QVQ 72b model. https://huggingface.co/kosbu/QVQ-72B-Preview-AWQ
basic config: 4 2080ti 22G tp=4

python3 -m sglang.launch_server --model-path /root/model/QVQ-72B-Preview-AWQ --host 0.0.0.0 --port 30000 --tp 4 --mem-fraction-static 0.7

As you may see, the prefilling stage take 20s.

What i can do to optimize the speed?
Or do i have option to turn off prefilling, when performing only one request?

The text was updated successfully, but these errors were encountered:

zhaochenyang20 · 2025-01-21T19:27:10Z

Well. This is quite strange. What GPUs and prompts are you using?

zhaochenyang20 · 2025-01-21T19:27:56Z

Okay. For 2080 that's normal. 😂 We may do not have better plan for this. Sorry.

WuNein · 2025-01-22T08:58:54Z

Okay. For 2080 that's normal. 😂 We may do not have better plan for this. Sorry.

One more thing, it is not quick on duo a6000s (w/ nvlink)
Prefilling Batch takes a lot of time.

zhaochenyang20 · 2025-01-23T08:43:09Z

@WuNein This could also make sense. A6000 is not a well-performed GPU indeed. And I think there is something related:

The bad thing is that we do not have these general devices.

Provide feedback