Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Problems with releasing memory after starting the vllm container #11902

Open
1 task done
JohnConnor123 opened this issue Jan 9, 2025 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@JohnConnor123
Copy link

JohnConnor123 commented Jan 9, 2025

🐛 Describe the bug

Hi all. Please tell me, is it possible to clear RAM usage after loading the scales onto the GPU? As far as I understand, RAM is only needed when loading scales from the SSD.

Now, when running vllm/vllm-openai:latest docker image, almost all of my RAM memory (20+ GB) is occupied by the vllm container to load the model, but after a successful launch, the memory is not released and other docker applications crash when launched due to OOM, which is not allowed in production.

Code to reproduce:

docker run --gpus '"device=0,1"' --rm -d --net host \
    --name vllm \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -v /home/thinclient/llm-server/weights:/mnt/weights \
    --env "HUGGING_FACE_HUB_TOKEN=<my_hg_token>" \
    --env "TORCH_USE_CUDA_DSA=1" \
    --env "CUDA_LAUNCH_BLOCKING=1" \
    --ipc host \
    vllm/vllm-openai:latest \
    --model /mnt/weights/saiga_nemo_12b-Q6_K.gguf \
    --chat-template "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] | trim + '\>    --tensor-parallel-size 2 \
    --pipeline-parallel-size 1 \
    --gpu-memory-utilization 0.99 \
    --max_model_len 11000 \
    --enable-prefix-caching \

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@JohnConnor123 JohnConnor123 added the bug Something isn't working label Jan 9, 2025
@JohnConnor123
Copy link
Author

@youkaichao @DarkLight1337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant