Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QVQ Prefill stage slow #2961

Open
WuNein opened this issue Jan 18, 2025 · 4 comments
Open

QVQ Prefill stage slow #2961

WuNein opened this issue Jan 18, 2025 · 4 comments

Comments

@WuNein
Copy link

WuNein commented Jan 18, 2025

Using int4 QVQ 72b model. https://huggingface.co/kosbu/QVQ-72B-Preview-AWQ
basic config: 4 2080ti 22G tp=4

python3 -m sglang.launch_server --model-path /root/model/QVQ-72B-Preview-AWQ --host 0.0.0.0 --port 30000 --tp 4 --mem-fraction-static 0.7

Image

As you may see, the prefilling stage take 20s.

What i can do to optimize the speed?
Or do i have option to turn off prefilling, when performing only one request?

@zhaochenyang20
Copy link
Collaborator

Well. This is quite strange. What GPUs and prompts are you using?

@zhaochenyang20
Copy link
Collaborator

Okay. For 2080 that's normal. 😂 We may do not have better plan for this. Sorry.

@WuNein
Copy link
Author

WuNein commented Jan 22, 2025

Okay. For 2080 that's normal. 😂 We may do not have better plan for this. Sorry.

One more thing, it is not quick on duo a6000s (w/ nvlink)
Prefilling Batch takes a lot of time.

Image

@zhaochenyang20
Copy link
Collaborator

@WuNein This could also make sense. A6000 is not a well-performed GPU indeed. And I think there is something related:

#2488

The bad thing is that we do not have these general devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants