You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this sounds unusual... that is the same command we use to serve Tulu 405B ourselves. It should 'just work'. Maybe try making sure you are using the latest version of vLLM? Otherwise I am not sure...
i tried with 0.7.2, and saw some past issue solution was to downgrade. So i tried a few older versions too and it was the same result with “!!!!” outputs only.
Hi,
I have tried to host Tulu 3 Llama 405B through VLLM as shown with the command:
"vllm serve /path/to/model --tensor-parallel-size 8 --pipeline-parallel-size 2"
I have set up the ray cluster through VLLM and connected 16 GPUs together.
Although it is able to be hosted, the inference only replies with "!!!!!!!!!!!!" regardless of my input.
I have tried the same command but replacing the model to a small one, such as Llama 70B instruct and it was able to respond normally.
Is there anything to take note when hosting Tulu3 Llama 405B? Thanks for any suggestion.
The text was updated successfully, but these errors were encountered: