This recipe deploys Nvidia NIM infrastructure, on Kubernetes, with GPUs. Specifically, we will:
- Deploy the NVIDIA GPU Operator onto Kubernetes so that pods can request GPUs.
- Select and deploy an LLM available on Nvidia NIM.
- Connect
spice
to the OpenAI compatible NIM LLM.
- A Kubernetes cluster, with at least 1 GPU node.
- Ensure that the GPU has a compute capability of 8.0 or higher.
- Local tools
-
Add the Nvidia Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update
-
Install the GPU Operator
helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator
- For additional
helm
overrides, see additional values. - Once the command completes (because of the
--wait
), Kubernetes pods will be able to ask for GPU requests/limits.
- For additional
For additional details & troubleshooting, see the official documentation.
-
Get a NGC API key from Nvidia's NGC website.
export NGC_API_KEY=""
-
Login to Nvidia's Docker registry
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
-
Login to Nvidia's Helm registry
helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.1.2.tgz --username=\$oauthtoken --password=$NGC_API_KEY
-
Create a secret to use for pulling images from docker registries.
kubectl create secret \ docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=$NGC_API_KEY
-
Similar to above, create a secret to pull model weights.
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY
-
Install the Helm chart.
helm install my-nim nim-llm-1.1.2.tgz -f values.yaml
For available models, use NGC CLI and run
ngc registry image list "nvcr.io/nim/*"
-
Add the helm repository
helm repo add spiceai https://helm.spiceai.org helm repo update
-
Deploy Spice
helm install spiceai spiceai/spiceai -f spiceai.yaml
-
Connect to Spice
kubectl port-forward deployment/spiceai 8090
-
Chat with
meta/llama3-8b-instruct
via NIM.spice chat
Using model: nim chat> Tell me a joke about the moon.