Nvidia NIM on Kubernetes with Spice

This recipe deploys Nvidia NIM infrastructure, on Kubernetes, with GPUs. Specifically, we will:

Prerequisites

A Kubernetes cluster, with at least 1 GPU node.
- Ensure that the GPU has a compute capability of 8.0 or higher.
Local tools
- helm: install
- kubectl: install
- spice: install

Add the Nvidia Helm repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update

For additional details & troubleshooting, see the official documentation.

Login to Nvidia's Docker registry

echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Login to Nvidia's Helm registry

helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.1.2.tgz --username=\$oauthtoken --password=$NGC_API_KEY

Create a secret to use for pulling images from docker registries.

kubectl create secret \
docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=$NGC_API_KEY

Similar to above, create a secret to pull model weights.

kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY

Install the Helm chart.

helm install my-nim nim-llm-1.1.2.tgz -f values.yaml

For available models, use NGC CLI and run

ngc registry image list "nvcr.io/nim/*"

Add the helm repository

helm repo add spiceai https://helm.spiceai.org
helm repo update

Deploy Spice

helm install spiceai spiceai/spiceai -f spiceai.yaml

Connect to Spice

kubectl port-forward deployment/spiceai 8090

Chat with meta/llama3-8b-instruct via NIM.

spice chat

Using model: nim
chat> Tell me a joke about the moon.