Hosting Tulu 3 Llama 405B on VLLM #565

WenJett · 2025-02-14T07:32:42Z

Hi,

I have tried to host Tulu 3 Llama 405B through VLLM as shown with the command:

"vllm serve /path/to/model --tensor-parallel-size 8 --pipeline-parallel-size 2"

I have set up the ray cluster through VLLM and connected 16 GPUs together.

Although it is able to be hosted, the inference only replies with "!!!!!!!!!!!!" regardless of my input.

I have tried the same command but replacing the model to a small one, such as Llama 70B instruct and it was able to respond normally.

Is there anything to take note when hosting Tulu3 Llama 405B? Thanks for any suggestion.

hamishivi · 2025-02-18T18:04:35Z

Hi, this sounds unusual... that is the same command we use to serve Tulu 405B ourselves. It should 'just work'. Maybe try making sure you are using the latest version of vLLM? Otherwise I am not sure...

WenJett · 2025-02-19T10:41:43Z

Hi,

Just wondering what version of VLLM was used?

i tried with 0.7.2, and saw some past issue solution was to downgrade. So i tried a few older versions too and it was the same result with “!!!!” outputs only.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hosting Tulu 3 Llama 405B on VLLM #565

Hosting Tulu 3 Llama 405B on VLLM #565

WenJett commented Feb 14, 2025

hamishivi commented Feb 18, 2025

WenJett commented Feb 19, 2025

Hosting Tulu 3 Llama 405B on VLLM #565

Hosting Tulu 3 Llama 405B on VLLM #565

Comments

WenJett commented Feb 14, 2025

hamishivi commented Feb 18, 2025

WenJett commented Feb 19, 2025