From 3b4f31b13c5ad10e3ec6ced0beac464014db7138 Mon Sep 17 00:00:00 2001 From: Zongheng Yang Date: Mon, 3 Feb 2025 21:31:25 -0800 Subject: [PATCH] [Docs] Minor polishing on Multiple Kubernetes docs. (#4642) --- .../reference/kubernetes/multi-kubernetes.rst | 71 ++++++++++--------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/docs/source/reference/kubernetes/multi-kubernetes.rst b/docs/source/reference/kubernetes/multi-kubernetes.rst index 135fa8747d3..7ca4533ec74 100644 --- a/docs/source/reference/kubernetes/multi-kubernetes.rst +++ b/docs/source/reference/kubernetes/multi-kubernetes.rst @@ -8,15 +8,15 @@ SkyPilot allows you to manage dev pods, jobs and services across multiple Kubern You may have multiple Kubernetes clusters for different: -* **Use cases**, e.g., a production cluster and a development/testing cluster. -* **Regions or clouds**, e.g., US and EU regions; or AWS and Lambda clouds. -* **Accelerators**, e.g., NVIDIA H100 cluster and a Google TPU cluster. -* **Configurations**, e.g., a small cluster for a single node and a large cluster for multiple nodes. -* **Kubernetes versions**, e.g., to upgrade a cluster from Kubernetes 1.20 to 1.21, you may create a new Kubernetes cluster to avoid downtime or unexpected errors. +* **Use cases**: e.g., a production cluster and a development/testing cluster. +* **Regions or clouds**: e.g., US and EU regions; or AWS and Lambda clouds. +* **Accelerators**: e.g., NVIDIA H100 cluster and a Google TPU cluster. +* **Configurations**: e.g., a small cluster for a single node and a large cluster for multiple nodes. +* **Kubernetes versions**: e.g., to upgrade a cluster from Kubernetes 1.20 to 1.21, you may create a new Kubernetes cluster to avoid downtime or unexpected errors. .. image:: /images/multi-kubernetes.svg - :width: 80% + :width: 95% :align: center .. original image: https://docs.google.com/presentation/d/1_NzqS_ccihsQKfbOTewPaH8D496zaHMuh-fvPsPf9y0/edit#slide=id.p @@ -27,7 +27,7 @@ Configuration Step 1: Set Up Credentials ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To work with multiple Kubernetes clusters, their credentials must be set up as individual `contexts `_ in your local ``~/.kube/config`` file. +To work with multiple Kubernetes clusters, their credentials must be set up as individual `contexts `_ in your local ``~/.kube/config`` file. For deploying new clusters and getting credentials, see :ref:`kubernetes-deployment`. @@ -38,7 +38,7 @@ For example, a ``~/.kube/config`` file may look like this: apiVersion: v1 clusters: - cluster: - certificate-authority-data: + certificate-authority-data: ... server: https://xx.xx.xx.xx:45819 name: my-h100-cluster @@ -63,14 +63,19 @@ For example, a ``~/.kube/config`` file may look like this: In this example, we have two Kubernetes clusters: ``my-h100-cluster`` and ``my-tpu-cluster``, and each Kubernetes cluster has a context for it. -Step 2: Setup SkyPilot to Access Multiple Kubernetes Clusters +Step 2: Set up SkyPilot to Access Multiple Kubernetes Clusters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Unlike clouds, SkyPilot does not failover through different Kubernetes clusters (regions) by default because each Kubernetes clusters can have a different purpose. +Unlike clouds, SkyPilot does not failover through different Kubernetes clusters +(regions) by default because each Kubernetes cluster can have a different +purpose. -By default, SkyPilot only uses the context set as the ``current-context`` in the kubeconfig. You can get the current context with ``kubectl config current-context``. +By default, SkyPilot only uses the context set in the ``current-context`` in the +kubeconfig. You can get the current context with ``kubectl config +current-context``. -To allow SkyPilot to access multiple Kubernetes clusters, you can set the ``kubernetes.allowed_contexts`` in the SkyPilot config. +To allow SkyPilot to access multiple Kubernetes clusters, you can set the +``kubernetes.allowed_contexts`` in the SkyPilot :ref:`global config `, ``~/.sky/config.yaml``. .. code-block:: yaml @@ -79,11 +84,11 @@ To allow SkyPilot to access multiple Kubernetes clusters, you can set the ``kube - my-h100-cluster - my-tpu-cluster -To check the enabled Kubernetes clusters, you can run ``sky check kubernetes``. +To check the enabled Kubernetes clusters, you can run ``sky check k8s``. .. code-block:: console - $ sky check kubernetes + $ sky check k8s 🎉 Enabled clouds 🎉 ✔ Kubernetes @@ -95,52 +100,52 @@ To check the enabled Kubernetes clusters, you can run ``sky check kubernetes``. Failover across Multiple Kubernetes Clusters -------------------------------------------- -With the ``kubernetes.allowed_contexts`` global config, SkyPilot failover through the Kubernetes clusters in the ``allowed_contexts`` in the same -order as they are specified. +With the ``kubernetes.allowed_contexts`` config set, SkyPilot will failover +through the Kubernetes clusters in the same order as they are specified in the field. .. code-block:: console - $ sky launch --gpus H100 --cloud kubernetes echo 'Hello World' + $ sky launch --gpus H100 --cloud k8s echo 'Hello World' Considered resources (1 node): ------------------------------------------------------------------------------------------------------------ - CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN + CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN ------------------------------------------------------------------------------------------------------------ - Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-gke 0.00 ✔ - Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-eks 0.00 + Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-gke 0.00 ✔ + Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-eks 0.00 ------------------------------------------------------------------------------------------------------------ -Point to a Kubernetes Cluster and Launch ------------------------------------------ +Launching in a Specific Kubernetes Cluster +------------------------------------------ -SkyPilot borrows the ``region`` concept from clouds to denote a Kubernetes context. You can point to a Kubernetes cluster +SkyPilot uses the ``region`` field to denote a Kubernetes context. You can point to a Kubernetes cluster by specifying the ``--region`` with the context name for that cluster. .. code-block:: console + + $ # Launch in a specific Kubernetes cluster. + $ sky launch --cloud k8s --region my-tpu-cluster echo 'Hello World' + $ # Check the GPUs available in a Kubernetes cluster - $ sky show-gpus --cloud kubernetes --region my-h100-cluster + $ sky show-gpus --cloud k8s --region my-h100-cluster Kubernetes GPUs (Context: my-h100-cluster) - GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS - H100 1, 2, 3, 4, 5, 6, 7, 8 8 8 + GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS + H100 1, 2, 3, 4, 5, 6, 7, 8 8 8 Kubernetes per node GPU availability - NODE_NAME GPU_NAME TOTAL_GPUS FREE_GPUS + NODE_NAME GPU_NAME TOTAL_GPUS FREE_GPUS my-h100-cluster-hbzn H100 8 8 my-h100-cluster-w5x7 None 0 0 When launching a SkyPilot cluster or task, you can also specify the context name with ``--region`` to launch the cluster or task in. -.. code-block:: console - - $ sky launch --cloud kubernetes --region my-tpu-cluster echo 'Hello World' - -Dynamically Update Kubernetes Clusters to Use +Dynamically Updating Clusters to Use ---------------------------------------------- -You can have configure SkyPilot to dynamically fetch Kubernetes cluster configs and enforce restrictions on which clusters are used. Refer to :ref:`dynamic-kubernetes-contexts-update-policy` for more. +You can configure SkyPilot to dynamically fetch Kubernetes cluster configs and enforce restrictions on which clusters are used. Refer to :ref:`dynamic-kubernetes-contexts-update-policy` for more.