[Bug]: Problems with releasing memory after starting the vllm container #11902

JohnConnor123 · 2025-01-09T15:23:13Z

🐛 Describe the bug

Hi all. Please tell me, is it possible to clear RAM usage after loading the scales onto the GPU? As far as I understand, RAM is only needed when loading scales from the SSD.

Now, when running vllm/vllm-openai:latest docker image, almost all of my RAM memory (20+ GB) is occupied by the vllm container to load the model, but after a successful launch, the memory is not released and other docker applications crash when launched due to OOM, which is not allowed in production.

Code to reproduce:

docker run --gpus '"device=0,1"' --rm -d --net host \
    --name vllm \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -v /home/thinclient/llm-server/weights:/mnt/weights \
    --env "HUGGING_FACE_HUB_TOKEN=<my_hg_token>" \
    --env "TORCH_USE_CUDA_DSA=1" \
    --env "CUDA_LAUNCH_BLOCKING=1" \
    --ipc host \
    vllm/vllm-openai:latest \
    --model /mnt/weights/saiga_nemo_12b-Q6_K.gguf \
    --chat-template "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] | trim + '\>    --tensor-parallel-size 2 \
    --pipeline-parallel-size 1 \
    --gpu-memory-utilization 0.99 \
    --max_model_len 11000 \
    --enable-prefix-caching \

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

JohnConnor123 · 2025-01-22T17:30:47Z

@youkaichao @DarkLight1337

d00mus · 2025-02-27T13:55:05Z

Same problem

JohnConnor123 added the bug Something isn't working label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Problems with releasing memory after starting the vllm container #11902

[Bug]: Problems with releasing memory after starting the vllm container #11902

JohnConnor123 commented Jan 9, 2025 •

edited

Loading

JohnConnor123 commented Jan 22, 2025

d00mus commented Feb 27, 2025

[Bug]: Problems with releasing memory after starting the vllm container #11902

[Bug]: Problems with releasing memory after starting the vllm container #11902

Comments

JohnConnor123 commented Jan 9, 2025 • edited Loading

🐛 Describe the bug

Before submitting a new issue...

JohnConnor123 commented Jan 22, 2025

d00mus commented Feb 27, 2025

JohnConnor123 commented Jan 9, 2025 •

edited

Loading