- Deploying GPT-in-a-Box NVD Reference Application using GitOps (FluxCD)
WARNING: This repository is intended to be a supplementary repository for bootstrapping the reference application explored with the Nutanix GPT-in-a-Box Validated Design (NVD) document, and is intended to be used only as a reference for self-learning or lab development purposes - not for production!
This repository was designed to support the GPT-in-a-Box NVD reference application, and includes a GitOps based reference framework for explicitly deploying the applications and dependent platform services on both the centralized management and dependent workload clusters running on Kubernetes using FluxCD.
Overall there are few core infrastructure components, such as Nutanix NKE and NUS that required to deploy and integrate all components of the GPT-in-a-Box (GiaB) NVD Reference Application, and will be needed to be made available prior to deployment.
Detailed deployment instructions of dependent components will not be covered in detail in this repository.
The below assumes that you are either leveraging a local machine with Linux terminal or jumpserver
Whether you're on a jumpserver, or running each step on local Linux/Windows terminal, the minimum tools needed are listed to bootstrap and configure any NKE cluster below:
Tool | Install Link | macOS Example | Linux Example | Purpose | License Link |
---|---|---|---|---|---|
Go-Task | Task GitHub | brew install go-task/tap/go-task |
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d |
A task runner / simpler Make alternative written in Go | MIT License |
Gomplate | Gomplate GitHub | brew install gomplate |
curl -o /usr/local/bin/gomplate -sSL https://github.com/hairyhenderson/gomplate/releases/latest/download/gomplate_linux-amd64 && chmod 755 /usr/local/bin/gomplate |
Template renderer which supports various data sources | MIT License |
Kubectl | Kubectl Docs | brew install kubectl |
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && chmod +x kubectl && sudo mv kubectl /usr/local/bin/ |
Command-line tool for controlling Kubernetes clusters | Apache License 2.0 |
Helm | Helm Docs | brew install helm |
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash |
Package manager for Kubernetes | Apache License 2.0 |
Age | Age GitHub | brew install age |
curl -LO https://github.com/FiloSottile/age/releases/download/v1.0.0/age-v1.0.0-linux-amd64.tar.gz && tar -xzf age-v1.0.0-linux-amd64.tar.gz && sudo mv age/age /usr/local/bin/ && sudo mv age/age-keygen /usr/local/bin/ |
A simple, modern, and secure file encryption tool | BSD 3-Clause License |
Sops | SOPS GitHub | brew install sops |
wget https://github.com/mozilla/sops/releases/download/v3.7.3/sops-v3.7.3.linux.amd64 && mv sops-v3.7.3.linux.amd64 /usr/local/bin/sops && chmod +x /usr/local/bin/sops |
Encryption tool for managing secrets | Mozilla Public License 2.0 |
Flux | Flux Docs | brew install fluxcd/tap/flux |
curl -s https://fluxcd.io/install.sh | sudo bash |
GitOps tool for deploying applications to Kubernetes | Apache License 2.0 |
Krew Kubectl Plugin Package Manager | Krew Docs | See [Krew Install](https://krew.sigs.k8s.io/docs/user-guide/setup/install/) |
Same | Plugin manager for kubectl command-line tool | Apache License 2.0 |
Karbon Plugin | Karbon CLI Docs | kubectl krew install karbon (after installing Krew) |
Same | Nutanix Karbon plugin for kubectl | Apache License 2.0 |
jq | jq GitHub | brew install jq |
sudo apt-get install jq |
Command-line JSON processor | MIT License |
yq | yq GitHub | brew install yq |
wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && chmod +x /usr/bin/yq |
Portable command-line YAML, JSON, and XML processor | MIT License |
Gum Charm | Gum GitHub | brew install gum |
curl -sSL https://github.com/charmbracelet/gum/releases/latest/download/gum_linux_amd64.tar.gz | tar xz && sudo mv gum /usr/local/bin/ |
Tool for creating glamorous shell scripts | MIT License |
arkade | arkade GitHub | curl -sLS https://get.arkade.dev | sudo sh |
Same | Marketplace for downloading your favorite devops CLIs and installing helm charts | MIT License |
GitHub CLI | GitHub CLI Docs | brew install gh |
sudo apt install gh |
GitHub’s official command line tool | MIT License |
fzf | fzf GitHub | brew install fzf |
sudo apt-get install fzf |
A command-line fuzzy finder | MIT License |
As an alternative, you can leverage Devbox NixOS Shell to provision the tools needed within an isolated developoment environment either locally or on remote jumpserver/bastion host.
Install devbox
and Accept All Defaults:
curl -fsSL https://get.jetpack.io/devbox | bash
Start the devbox shell
and if nix
isn't available, you will be prompted to install:
devbox shell
By default, Devbox will install the cli tool packages listed in devbox.json
when you run devbox shell
within the code workspace. These packages are available within the Nixos package manager, and below is listing of each package
Tool | Install Link | Purpose | License Link | Devbox Add Command |
---|---|---|---|---|
[email protected] | jq GitHub | Command-line JSON processor | MIT License | devbox add [email protected] |
[email protected] | Gum GitHub | Tool for creating glamorous shell scripts | MIT License | devbox add [email protected] |
[email protected] | GitHub CLI Docs | GitHub's official command line tool | MIT License | devbox add [email protected] |
[email protected] | Kubectl Docs | Command-line tool for controlling Kubernetes clusters | Apache License 2.0 | devbox add [email protected] |
[email protected] | Age GitHub | A simple, modern, and secure file encryption tool | BSD 3-Clause License | devbox add [email protected] |
[email protected] | arkade GitHub | Marketplace for downloading your favorite devops CLIs and installing helm charts | MIT License | devbox add [email protected] |
[email protected] | yq GitHub | Portable command-line YAML, JSON, and XML processor | MIT License | devbox add [email protected] |
[email protected] | SOPS GitHub | Encryption tool for managing secrets | Mozilla Public License 2.0 | devbox add [email protected] |
[email protected] | Helm Docs | Package manager for Kubernetes | Apache License 2.0 | devbox add [email protected] |
[email protected] | Task GitHub | A task runner / simpler Make alternative written in Go | MIT License | devbox add [email protected] |
[email protected] | Krew Docs | Plugin manager for kubectl command-line tool | Apache License 2.0 | devbox add [email protected] |
[email protected] | kubectx GitHub | Tool to switch between Kubernetes contexts | Apache License 2.0 | devbox add [email protected] |
[email protected] | OpenTofu GitHub | Open-source alternative to Terraform | Mozilla Public License 2.0 | devbox add [email protected] |
[email protected] | Flux Docs | GitOps tool for deploying applications to Kubernetes | Apache License 2.0 | devbox add [email protected] |
[email protected] | Stern GitHub | Multi pod and container log tailing for Kubernetes | Apache License 2.0 | devbox add [email protected] |
[email protected] | fzf GitHub | A command-line fuzzy finder | MIT License | devbox add [email protected] |
[email protected] | Gomplate GitHub | Template renderer which supports various data sources | MIT License | devbox add [email protected] |
Follow the procedures below for the management cluster has run into full completion, then rinse and repeat for each subsequent workload specific cluster (e.g., prod, dev, qa, etc.)
-
Set K8S_CLUSTER_NAME environment variable and copy
./.env.sample.yaml
to./.env.${K8S_CLUSTER_NAME}.yaml
export K8S_CLUSTER_NAME=flux-kind-local cp ./.env.sample.yaml ./.env.${K8S_CLUSTER_NAME}.yaml
-
Update
.env.${K8S_CLUSTER_NAME}.yaml
with required values. If parameter is enabled, update required parameters below each sectionFor example, if
kube_vip.enabled
is true, then uncomment and configurekube_vip.ipam_range
:kube_vip: enabled: false ## min. 2 ips for range (i.e., 10.38.110.22-10.38.110.23) ipam_range: required
For a breakdown of what components are required to be configured for each profile (i.e.,
llm-management
orllm-workloads
) based on environment-type (i.e.,non-prod
vsprod
), see LLM Managment and Workload Profile to Component Mappings.To see more detail on what variables are required for each component based on environment type, see Variables.
-
Generate and Validate Configurations
task bootstrap:generate_cluster_configs
Verify the generated cluster configs
cat .local/${K8S_CLUSTER_NAME}/.env cat clusters/${K8S_CLUSTER_NAME}/platform/cluster-configs.yaml
-
[Optional] Validate Encrypted Secrets
task sops:decrypt
-
Select New (or Switching to Existing) Cluster and Download NKE Creds
eval $(task nke:switch-shell-env) && \ task nke:download-creds && \ kubectl get nodes
-
[Optional] Taint GPU Nodepool with
nvidia.com/gpu.present=true:NoSchedule
NOTE: if undesired workloads already running on gpu nodepools, drain nodes using
task kubectl:drain_gpu_nodes
## taint gpu nodes with label nvidia.com/gpu.present=true task kubectl:taint_gpu_nodes ## view taint configurations on all nodes kubectl get nodes -o='custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect'
-
Run Flux Bootstrapping -
task bootstrap:silent
task bootstrap:silent
NOTE: if there are any issues, troubleshot using
task ts:flux-collect
. You can re-run taskbootstrap:silent
as many times needed -
Monitor on New Terminal
eval $(task nke:switch-shell-env) && \ task flux:watch
NOTE: if there are any issues, update local git repo, push up changes and run
task flux:reconcile
The table below provides an overview of which components are recommended to be configured based on the different profile types and environments listed under clusters/_profiles/llm-*
.
Legend:
- ✅: Required for given profile
- ❌: Not applicable or not specified for this profile
- ✅ (non-prod): Is enabled ONLY in non-production environments
- ✅ (prod): Is enabled ONLY in production environments
Component | Parameter | llm-management | llm-workloads |
---|---|---|---|
GPU Operator | gpu_operator.enabled |
❌ | ✅ |
GPU Time Slicing | gpu_operator.time_slicing.enabled |
❌ | ✅ (non-prod) |
Prism Central | nutanix.prism_central.enabled |
✅ | ✅ |
Objects Store | nutanix.objects.enabled |
✅ | ✅ |
Kube-vip | kube_vip.enabled |
✅ | ✅ |
Nginx Ingress | nginx_ingress.enabled |
✅ | ✅ |
Istio | istio.enabled |
❌ | ✅ |
Cert Manager | cert_manager.enabled |
✅ | ✅ |
AWS Route53 ACME DNS | cert_manager.aws_route53_acme_dns.enabled |
✅ (prod) | ✅ (prod) |
Kyverno | kyverno.enabled |
✅ | ✅ |
KServe | kserve.enabled |
❌ | ✅ |
Knative Serving | knative_serving.enabled |
❌ | ✅ |
Knative Istio | knative_istio.enabled |
❌ | ✅ |
Milvus | milvus.enabled |
✅ | ✅ |
Knative Eventing | knative_eventing.enabled |
❌ | ✅ |
Kafka | kafka.enabled |
✅ | ✅ |
OpenTelemetry Collector | opentelemetry_collector.enabled |
✅ | ✅ |
OpenTelemetry Operator | opentelemetry_operator.enabled |
✅ | ✅ |
Uptrace | uptrace.enabled |
✅ | ❌ |
JupyterHub | jupyterhub.enabled |
❌ | ✅ (non-prod) |
Weave GitOps | weave_gitops.enabled |
✅ | ✅ |
GPT NVD Reference App | gptnvd_reference_app.enabled |
❌ | ✅ |
NAI LLM Helm Chart | nai_helm.enabled |
❌ | ✅ |
Below are the various variables and values that could be defined with the .env.CLUSTER_NAME.yaml
configuration file.
The parameters are listed based on the various sections required for either all profiles and environments, or specific to just llm-management
, llm-workloads
or non-prod
, prod
use cases.
NOTE: While
prod
is in the naming convention ofenvironment
name, it is only intended to demonstrate what aproduction-like
configuration may require from an advanced configuration perspective, such as valid TLS, monitoring, logging, HA, etc.
This section is required for all profiles and environments. It includes configurations for the Kubernetes cluster, Docker Hub registry, and GPU-specific settings.
Parameter | Required | Default Value | Profile/Environment Type | Description |
---|---|---|---|---|
distribution |
No | nke | All | Kubernetes distribution. Supported options are "nke" and "kind". Example: kind |
name |
Yes | "" | All | Kubernetes cluster name. Example: my-cluster |
profile |
Yes | "" | All | Cluster profile type. Can be anything under clusters/_profiles (e.g., llm-management, llm-workloads, etc.). Example: llm-management |
environment |
Yes | "" | All | Environment name based on the selected profile under clusters/_profiles// (e.g., prod, non-prod, etc.). Example: prod |
registry.docker_hub.user |
Yes | "" | All | Docker Hub registry username. Example: docker_user |
registry.docker_hub.password |
Yes | "" | All | Docker Hub registry password. Example: docker_password |
gpu_operator.enabled |
No | false | llm-workloads | Enable NVIDIA GPU operator. Example: true |
gpu_operator.version |
No | v23.9.0 | llm-workloads | Version of the NVIDIA GPU operator. Example: v24.0.0 |
gpu_operator.cuda_toolkit_version |
No | v1.14.3-centos7 | llm-workloads | CUDA toolkit version for the GPU operator. Example: v1.15.0-ubuntu |
gpu_operator.time_slicing.enabled |
No | false | llm-workloads (non-prod) | Enable GPU time slicing. Example: true |
gpu_operator.time_slicing.replica_count |
No | 2 | llm-workloads (non-prod) | Number of replicas for GPU time slicing. Example: 3 |
This section is required for all profiles and environments. It includes Flux-specific configurations for the GitHub repository.
Parameter | Required | Default Value | Profile/Environment Type | Description |
---|---|---|---|---|
github.repo_url |
Yes | "" | All | GitHub repository URL for Flux. Example: https://github.com/user/repo |
github.repo_user |
Yes | "" | All | GitHub username for Flux. Example: github_user |
github.repo_api_token |
Yes | "" | All | GitHub API token for Flux. Example: github_token |
This section is required for all profiles and environments. It includes global Nutanix configurations, including Prism Central credentials and Nutanix Objects Store configurations.
Parameter | Required | Default Value | Profile/Environment Type | Description |
---|---|---|---|---|
nutanix.prism_central.enabled |
No | true | All | Enable Nutanix Prism Central. Example: false |
nutanix.prism_central.endpoint |
Yes | "" | All | Nutanix Prism Central endpoint. Example: prism_endpoint |
nutanix.prism_central.user |
Yes | "" | All | Nutanix Prism Central username. Example: prism_user |
nutanix.prism_central.password |
Yes | "" | All | Nutanix Prism Central password. Example: prism_password |
nutanix.objects.enabled |
No | false | llm-management, llm-workloads | Enable Nutanix Objects Store. Example: true |
nutanix.objects.host |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Nutanix Objects store host. Example: objects_host |
nutanix.objects.port |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Nutanix Objects store port. Example: 9000 |
nutanix.objects.region |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Nutanix Objects store region. Example: us-west-1 |
nutanix.objects.use_ssl |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Use SSL for Nutanix Objects store. Example: true |
nutanix.objects.access_key |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Nutanix Objects store access key. Example: access_key |
nutanix.objects.secret_key |
No (Required - If objects section enabled) | "" | llm-management, llm-workloads | Nutanix Objects store secret key. Example: secret_key |
This section is required for all profiles and environments. It includes configurations for various services such as kube-vip, nginx-ingress, istio, and others.
Parameter | Required | Default Value | Profile/Environment Type | Description |
---|---|---|---|---|
kube_vip.enabled |
No | false | All | Enable kube-vip. Example: true |
kube_vip.ipam_range |
Yes | "" | All | IP range for kube-vip IPAM pool. Minimum of 2 IPs should be provided. Example: 172.20.0.22-172.20.0.23 |
nginx_ingress.enabled |
No | false | All | Enable nginx-ingress. Example: true |
nginx_ingress.version |
No | 4.8.3 | All | Version of nginx-ingress. Example: 4.9.0 |
nginx_ingress.vip |
Yes | "" | All | Virtual IP Address (VIP) dedicated for nginx-ingress controller. Example: 172.20.0.20 |
nginx_ingress.wildcard_ingress_subdomain |
Yes | "" | All | NGINX Wildcard Ingress Subdomain for all default ingress objects. Example: flux-kind-local.172.20.0.20.nip.io |
nginx_ingress.management_cluster_ingress_subdomain |
Yes | "" | All | Wildcard Ingress Subdomain for management cluster. Example: management.flux-kind-local.172.20.0.20.nip.io |
istio.enabled |
No | false | llm-workloads | Enable Istio. Example: true |
istio.version |
No | 1.17.2 | llm-workloads | Version of Istio. Example: 1.18.0 |
istio.vip |
No (Required - If istio section enabled) | "" | llm-workloads | Virtual IP Address (VIP) dedicated for Istio ingress gateway. Example: 172.20.0.21 |
istio.wildcard_ingress_subdomain |
No (Required - If istio section enabled) | "" | llm-workloads | Istio Ingress Gateway Wildcard Subdomain for knative/kserve LLM inference endpoints. Example: llm.flux-kind-local.172.20.0.21.nip.io |
cert_manager.enabled |
No | true | All | Enable cert-manager. Example: false |
cert_manager.version |
No | v1.13.5 | All | Version of cert-manager. Example: v1.14.0 |
cert_manager.aws_route53_acme_dns.enabled |
No | false | All (prod) | Enable AWS Route53 ACME DNS for Let's Encrypt. Example: true |
cert_manager.aws_route53_acme_dns.email |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | Email for Let's Encrypt ACME DNS challenge. Example: [email protected] |
cert_manager.aws_route53_acme_dns.zone |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | DNS zone for Let's Encrypt ACME DNS challenge. Example: example.com |
cert_manager.aws_route53_acme_dns.hosted_zone_id |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | AWS Route53 hosted zone ID for Let's Encrypt ACME DNS challenge. Example: Z1234567890 |
cert_manager.aws_route53_acme_dns.region |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | AWS region for Let's Encrypt ACME DNS challenge. Example: us-west-2 |
cert_manager.aws_route53_acme_dns.key_id |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | AWS access key ID for Let's Encrypt ACME DNS challenge. Example: AWS_KEY_ID |
cert_manager.aws_route53_acme_dns.key_secret |
No (Required - If aws_route53_acme_dns section enabled) | "" | All (prod) | AWS secret access key for Let's Encrypt ACME DNS challenge. Example: AWS_KEY_SECRET |
kyverno.enabled |
No | true | All | Enable Kyverno. Example: false |
kyverno.version |
No | 3.1.4 | All | Version of Kyverno. Example: 3.2.0 |
kserve.enabled |
No | false | llm-workloads | Enable KServe. Example: true |
kserve.version |
No | v0.11.2 | llm-workloads | Version of KServe. Example: v0.12.0 |
knative_serving.enabled |
No | false | llm-workloads | Enable Knative Serving. Example: true |
knative_serving.version |
No | knative-v1.10.1 | llm-workloads | Version of Knative Serving. Example: knative-v1.11.0 |
knative_istio.enabled |
No | false | llm-workloads | Enable Knative Istio. Example: true |
knative_istio.version |
No | knative-v1.10.0 | llm-workloads | Version of Knative Istio. Example: knative-v1.11.0 |
milvus.enabled |
No | false | All | Enable Milvus. Example: true |
milvus.version |
No | 4.1.13 | All | Version of Milvus. Example: 4.2.0 |
milvus.milvus_bucket_name |
No | milvus | All | Milvus vector database bucket name. Example: milvus-bucket |
knative_eventing.enabled |
No | false | llm-workloads | Enable Knative Eventing. Example: true |
knative_eventing.version |
No | knative-v1.10.1 | llm-workloads | Version of Knative Eventing. Example: knative-v1.11.0 |
kafka.enabled |
No | false | All | Enable Kafka. Example: true |
kafka.version |
No | 26.8.5 | All | Version of Kafka. Example: 27.0.0 |
opentelemetry_collector.enabled |
No | false | All | Enable OpenTelemetry Collector. Example: true |
opentelemetry_collector.version |
No | 0.80.1 | All | Version of OpenTelemetry Collector. Example: 0.81.0 |
opentelemetry_operator.enabled |
No | false | All | Enable OpenTelemetry Operator. Example: true |
opentelemetry_operator.version |
No | 0.47.0 | All | Version of OpenTelemetry Operator. Example: 0.48.0 |
uptrace.enabled |
No | false | llm-management | Enable Uptrace. Example: true |
uptrace.version |
No | 1.5.7 | llm-management | Version of Uptrace. Example: 1.6.0 |
jupyterhub.enabled |
No | false | llm-workloads (non-prod) | Enable JupyterHub. Example: true |
jupyterhub.version |
No | 3.1.0 | llm-workloads (non-prod) | Version of JupyterHub. Example: 3.2.0 |
jupyterhub.password |
No | "" | llm-workloads (non-prod) | JupyterHub Default password for admin and allowed_users. Example: jupyterhub_password |
jupyterhub.overrides |
No | {} | llm-workloads (non-prod) | JupyterHub Custom Overrides YAML. Similar to helm-chart values key-pair values. Used to override values not handled by default (ex. adding custom allowed_users). Example below |
weave_gitops.enabled |
Yes | true | All | Enable Weave GitOps. Example: false |
weave_gitops.version |
Yes | 4.0.36 | All | Version of Weave GitOps. Example: 4.1.0 |
weave_gitops.password |
Yes | "" | All | Weave Gitops password for admin and allowed_users. Example: weave_gitops_password |
Example of jupyterhub.overrides
in JupyterHub section:
## Jupyterhub is deployed on non-prod workload clusters in NVD Reference
jupyterhub:
enabled: true
version: 3.1.0
## default admin account password
default_password: Nutanix.123
## this will merge or override only the inline values defined within platform/jupyterhub/_operators/jupyterhub.yaml,
## but could be overriden by add-ons/patches, so use at your own risk
## example below merges hub config with additional users:
overrides: |-
hub:
config:
Authenticator:
allowed_users:
- wolfgang
- jesse
This section is required only for llm-workloads profiles and environments. It includes configurations for the GPT NVD Reference Application and NAI LLM Helm Chart.
Parameter | Required | Default Value | Profile/Environment Type | Description |
---|---|---|---|---|
gptnvd_reference_app.enabled |
No | false | llm-workloads | Enable GPT NVD Reference Application. Example: true |
gptnvd_reference_app.version |
No | 0.2.7 | llm-workloads | Version of GPT NVD Reference Application. Example: 0.3.0 |
gptnvd_reference_app.documents_bucket_name |
Yes | documents01 | llm-workloads | Document bucket name for GPT NVD Reference Application. Example: my-documents |
nai_helm.enabled |
No | false | llm-workloads | Enable NAI LLM Helm Chart. Example: true |
nai_helm.version |
No | 0.1.1 | llm-workloads | Version of NAI LLM Helm Chart. Example: 0.2.0 |
nai_helm.model |
No | llama2_7b_chat | llm-workloads | LLM model name to be used. Example: llama-2-13b-chat |
nai_helm.revision |
No | 94b07a6e30c3292b8265ed32ffdeccfdadf434a8 | llm-workloads | Revision of the LLM model. Example: c2f3ec81aac798ae26dcc57799a994dfbf521496 |
nai_helm.maxTokens |
No | 4000 | llm-workloads | Maximum number of tokens for LLM inference. Example: 5000 |
nai_helm.repPenalty |
No | 1.2 | llm-workloads | Repetition penalty for LLM inference. Example: 1.5 |
nai_helm.temperature |
No | 0.2 | llm-workloads | Temperature setting for LLM inference. Example: 0.3 |
nai_helm.topP |
No | 0.9 | llm-workloads | Top-p (nucleus sampling) setting for LLM inference. Example: 0.95 |
nai_helm.useExistingNFS |
No | false | llm-workloads | Use existing NFS for LLM models. Example: true |
nai_helm.nfs_export |
No (Required - If useExistingNFS is true) | "" | llm-workloads | NFS share where LLM models are stored. Example: /llm-model-store |
nai_helm.nfs_server |
No (Required - If useExistingNFS is true) | "" | llm-workloads | NFS server FQDN or IP address of Nutanix Files instance. Example: nfs_server |
nai_helm.huggingFaceToken |
No (Required - If useExistingNFS is false) | "" | llm-workloads | Hugging Face token required when useExistingNFS is False. Used to download the model when LLM is initialized. Example: hugging_face_token |
.
├── configs # Directory for env configs used by local scripts
│ ├── _common
│ │ └── .env # Global env configs
│ └── ...
├── scripts # Common path for local scripts
├── clusters
│ ├── _profiles # Directory for the varying different profiles
│ │ ├── _base # Base for all cluster profiles (things installed in all variants)
│ │ └── kustomization.yaml # Kustomize ref to one or many platform _base/<service> and _components/feature
│ │ ├── llm-management # Management cluster profile (defines management specific applications and platform variants)
│ │ └── llm-workloads # Workloads cluster profile (defines workload specific applications and platform variants)
│ ├── nke-management-cluster # An example cluster instance generated from `_templates/management-cluster`
│ │ ├── flux-system # Generated by flux bootstrap
│ │ └── platform # generated from _templates
│ │ ├── kustomization.yaml # Maps to `management` profile above and injects secrets/config in the cluster
│ │ ├── cluster-secrets.sops.yaml
│ │ └── cluster-configs.yaml
│ └── nke-workload-cluster-00 # An example cluster instance generated from `_templates/[prod|non-prod]-workload-cluster`
└── platform # Contains catalogue of all the platform services
├── cert-manager
├── ingress-nginx # Example service
│ ├── _base # Base implementation of this service
│ ├── kustomization.yaml # Kustomize ref to ingress-nginx.yaml
│ └── ingress-nginx.yaml # helm chart release details for ingress-nginx
│ └── nodeport # Feature to expose nginx in a NodePort instead of in a LoadBalancer
│ ├── kustomization.yaml # Component ref to _base and ingress-nginx-patch
│ └── ingress-nginx-patch.yaml # patch that overrides ingress-nginx helm chart
└── ...