Skip to content

Latest commit

 

History

History
119 lines (83 loc) · 3.24 KB

README.md

File metadata and controls

119 lines (83 loc) · 3.24 KB

Nvidia NIM on Kubernetes with Spice

This recipe deploys Nvidia NIM infrastructure, on Kubernetes, with GPUs. Specifically, we will:

  1. Deploy the NVIDIA GPU Operator onto Kubernetes so that pods can request GPUs.
  2. Select and deploy an LLM available on Nvidia NIM.
  3. Connect spice to the OpenAI compatible NIM LLM.

Prerequisites

  1. A Kubernetes cluster, with at least 1 GPU node.
  2. Local tools

Deploying GPU-operator

  1. Add the Nvidia Helm repository

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
        && helm repo update
  2. Install the GPU Operator

    helm install --wait --generate-name \
        -n gpu-operator --create-namespace \
        nvidia/gpu-operator
    • For additional helm overrides, see additional values.
    • Once the command completes (because of the --wait), Kubernetes pods will be able to ask for GPU requests/limits.

For additional details & troubleshooting, see the official documentation.

Configuring NIMs

  1. Get a NGC API key from Nvidia's NGC website.

    export NGC_API_KEY=""
  2. Login to Nvidia's Docker registry

    echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
  3. Login to Nvidia's Helm registry

    helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.1.2.tgz --username=\$oauthtoken --password=$NGC_API_KEY
  4. Create a secret to use for pulling images from docker registries.

    kubectl create secret \
    docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=$NGC_API_KEY
  5. Similar to above, create a secret to pull model weights.

    kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY
  6. Install the Helm chart.

    helm install my-nim nim-llm-1.1.2.tgz -f values.yaml

    For available models, use NGC CLI and run

    ngc registry image list "nvcr.io/nim/*"

Connect Spice

  1. Add the helm repository

    helm repo add spiceai https://helm.spiceai.org
    helm repo update
  2. Deploy Spice

    helm install spiceai spiceai/spiceai -f spiceai.yaml
  3. Connect to Spice

    kubectl port-forward deployment/spiceai 8090
  4. Chat with meta/llama3-8b-instruct via NIM.

    spice chat
    Using model: nim
    chat> Tell me a joke about the moon.