Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosting Tulu 3 Llama 405B on VLLM #565

Open
WenJett opened this issue Feb 14, 2025 · 2 comments
Open

Hosting Tulu 3 Llama 405B on VLLM #565

WenJett opened this issue Feb 14, 2025 · 2 comments

Comments

@WenJett
Copy link

WenJett commented Feb 14, 2025

Hi,

I have tried to host Tulu 3 Llama 405B through VLLM as shown with the command:

"vllm serve /path/to/model --tensor-parallel-size 8 --pipeline-parallel-size 2"

I have set up the ray cluster through VLLM and connected 16 GPUs together.

Although it is able to be hosted, the inference only replies with "!!!!!!!!!!!!" regardless of my input.

I have tried the same command but replacing the model to a small one, such as Llama 70B instruct and it was able to respond normally.

Is there anything to take note when hosting Tulu3 Llama 405B? Thanks for any suggestion.

@hamishivi
Copy link
Collaborator

Hi, this sounds unusual... that is the same command we use to serve Tulu 405B ourselves. It should 'just work'. Maybe try making sure you are using the latest version of vLLM? Otherwise I am not sure...

@WenJett
Copy link
Author

WenJett commented Feb 19, 2025

Hi,

Just wondering what version of VLLM was used?

i tried with 0.7.2, and saw some past issue solution was to downgrade. So i tried a few older versions too and it was the same result with “!!!!” outputs only.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants