Skip to content

Commit

Permalink
[Docs] Minor polishing on Multiple Kubernetes docs. (#4642)
Browse files Browse the repository at this point in the history
  • Loading branch information
concretevitamin authored Feb 4, 2025
1 parent 29fa533 commit 3b4f31b
Showing 1 changed file with 38 additions and 33 deletions.
71 changes: 38 additions & 33 deletions docs/source/reference/kubernetes/multi-kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ SkyPilot allows you to manage dev pods, jobs and services across multiple Kubern

You may have multiple Kubernetes clusters for different:

* **Use cases**, e.g., a production cluster and a development/testing cluster.
* **Regions or clouds**, e.g., US and EU regions; or AWS and Lambda clouds.
* **Accelerators**, e.g., NVIDIA H100 cluster and a Google TPU cluster.
* **Configurations**, e.g., a small cluster for a single node and a large cluster for multiple nodes.
* **Kubernetes versions**, e.g., to upgrade a cluster from Kubernetes 1.20 to 1.21, you may create a new Kubernetes cluster to avoid downtime or unexpected errors.
* **Use cases**: e.g., a production cluster and a development/testing cluster.
* **Regions or clouds**: e.g., US and EU regions; or AWS and Lambda clouds.
* **Accelerators**: e.g., NVIDIA H100 cluster and a Google TPU cluster.
* **Configurations**: e.g., a small cluster for a single node and a large cluster for multiple nodes.
* **Kubernetes versions**: e.g., to upgrade a cluster from Kubernetes 1.20 to 1.21, you may create a new Kubernetes cluster to avoid downtime or unexpected errors.


.. image:: /images/multi-kubernetes.svg
:width: 80%
:width: 95%
:align: center

.. original image: https://docs.google.com/presentation/d/1_NzqS_ccihsQKfbOTewPaH8D496zaHMuh-fvPsPf9y0/edit#slide=id.p
Expand All @@ -27,7 +27,7 @@ Configuration
Step 1: Set Up Credentials
~~~~~~~~~~~~~~~~~~~~~~~~~~~

To work with multiple Kubernetes clusters, their credentials must be set up as individual `contexts <https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/>`_ in your local ``~/.kube/config`` file.
To work with multiple Kubernetes clusters, their credentials must be set up as individual `contexts <https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/>`_ in your local ``~/.kube/config`` file.

For deploying new clusters and getting credentials, see :ref:`kubernetes-deployment`.

Expand All @@ -38,7 +38,7 @@ For example, a ``~/.kube/config`` file may look like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data:
certificate-authority-data:
...
server: https://xx.xx.xx.xx:45819
name: my-h100-cluster
Expand All @@ -63,14 +63,19 @@ For example, a ``~/.kube/config`` file may look like this:
In this example, we have two Kubernetes clusters: ``my-h100-cluster`` and ``my-tpu-cluster``, and each Kubernetes cluster has a context for it.

Step 2: Setup SkyPilot to Access Multiple Kubernetes Clusters
Step 2: Set up SkyPilot to Access Multiple Kubernetes Clusters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Unlike clouds, SkyPilot does not failover through different Kubernetes clusters (regions) by default because each Kubernetes clusters can have a different purpose.
Unlike clouds, SkyPilot does not failover through different Kubernetes clusters
(regions) by default because each Kubernetes cluster can have a different
purpose.

By default, SkyPilot only uses the context set as the ``current-context`` in the kubeconfig. You can get the current context with ``kubectl config current-context``.
By default, SkyPilot only uses the context set in the ``current-context`` in the
kubeconfig. You can get the current context with ``kubectl config
current-context``.

To allow SkyPilot to access multiple Kubernetes clusters, you can set the ``kubernetes.allowed_contexts`` in the SkyPilot config.
To allow SkyPilot to access multiple Kubernetes clusters, you can set the
``kubernetes.allowed_contexts`` in the SkyPilot :ref:`global config <config-yaml>`, ``~/.sky/config.yaml``.

.. code-block:: yaml
Expand All @@ -79,11 +84,11 @@ To allow SkyPilot to access multiple Kubernetes clusters, you can set the ``kube
- my-h100-cluster
- my-tpu-cluster
To check the enabled Kubernetes clusters, you can run ``sky check kubernetes``.
To check the enabled Kubernetes clusters, you can run ``sky check k8s``.

.. code-block:: console
$ sky check kubernetes
$ sky check k8s
🎉 Enabled clouds 🎉
✔ Kubernetes
Expand All @@ -95,52 +100,52 @@ To check the enabled Kubernetes clusters, you can run ``sky check kubernetes``.
Failover across Multiple Kubernetes Clusters
--------------------------------------------

With the ``kubernetes.allowed_contexts`` global config, SkyPilot failover through the Kubernetes clusters in the ``allowed_contexts`` in the same
order as they are specified.
With the ``kubernetes.allowed_contexts`` config set, SkyPilot will failover
through the Kubernetes clusters in the same order as they are specified in the field.


.. code-block:: console
$ sky launch --gpus H100 --cloud kubernetes echo 'Hello World'
$ sky launch --gpus H100 --cloud k8s echo 'Hello World'
Considered resources (1 node):
------------------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
------------------------------------------------------------------------------------------------------------
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-gke 0.00 ✔
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-eks 0.00
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-gke 0.00 ✔
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 my-h100-cluster-eks 0.00
------------------------------------------------------------------------------------------------------------
Point to a Kubernetes Cluster and Launch
-----------------------------------------
Launching in a Specific Kubernetes Cluster
------------------------------------------

SkyPilot borrows the ``region`` concept from clouds to denote a Kubernetes context. You can point to a Kubernetes cluster
SkyPilot uses the ``region`` field to denote a Kubernetes context. You can point to a Kubernetes cluster
by specifying the ``--region`` with the context name for that cluster.

.. code-block:: console
$ # Launch in a specific Kubernetes cluster.
$ sky launch --cloud k8s --region my-tpu-cluster echo 'Hello World'
$ # Check the GPUs available in a Kubernetes cluster
$ sky show-gpus --cloud kubernetes --region my-h100-cluster
$ sky show-gpus --cloud k8s --region my-h100-cluster
Kubernetes GPUs (Context: my-h100-cluster)
GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS
H100 1, 2, 3, 4, 5, 6, 7, 8 8 8
GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS
H100 1, 2, 3, 4, 5, 6, 7, 8 8 8
Kubernetes per node GPU availability
NODE_NAME GPU_NAME TOTAL_GPUS FREE_GPUS
NODE_NAME GPU_NAME TOTAL_GPUS FREE_GPUS
my-h100-cluster-hbzn H100 8 8
my-h100-cluster-w5x7 None 0 0
When launching a SkyPilot cluster or task, you can also specify the context name with ``--region`` to launch the cluster or task in.

.. code-block:: console
$ sky launch --cloud kubernetes --region my-tpu-cluster echo 'Hello World'

Dynamically Update Kubernetes Clusters to Use
Dynamically Updating Clusters to Use
----------------------------------------------

You can have configure SkyPilot to dynamically fetch Kubernetes cluster configs and enforce restrictions on which clusters are used. Refer to :ref:`dynamic-kubernetes-contexts-update-policy` for more.
You can configure SkyPilot to dynamically fetch Kubernetes cluster configs and enforce restrictions on which clusters are used. Refer to :ref:`dynamic-kubernetes-contexts-update-policy` for more.

0 comments on commit 3b4f31b

Please sign in to comment.