Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
oandreeva-nv committed May 29, 2024
1 parent f77614e commit bf534b4
Showing 1 changed file with 0 additions and 3 deletions.
3 changes: 0 additions & 3 deletions docs/llama_multi_lora_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,6 @@ sudo docker run --gpus all -it --net=host -p 8001:8001 --shm-size=12G \
Triton's vLLM container has been introduced starting from 23.10 release, and `multi-lora` experimental support was added in vLLM v0.3.0 release.

> Docker image version `nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3` or higher version is strongly recommended.
> [!IMPORTANT]
> 24.05 release is still under active development, and relevant NGC containers are not available at this time.
---

For **pre-24.05 containers**, the docker images didn't support multi-lora feature, so you need to replace that provided in the container `/opt/tritonserver/backends/vllm/model.py` with the most up to date version. Just follow this command:
Expand Down

0 comments on commit bf534b4

Please sign in to comment.