Skip to content

Commit

Permalink
DOC-875 Add section on disabling automatic node maintenance and OS up…
Browse files Browse the repository at this point in the history
…grades (#941)

Co-authored-by: Joyce Fee <[email protected]>
  • Loading branch information
JakeSCahill and Feediver1 authored Jan 17, 2025
1 parent 97a0e3a commit b264be4
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ Managed Kubernetes services, such as Google Kubernetes Engine (GKE) and Amazon E

You remain responsible for deploying and maintaining Redpanda instances on worker nodes.

IMPORTANT: Deploy Kubernetes clusters with *unmanaged (manual) node updates*. Managed (automatic) updates during cluster deployment can lead to service downtime, data loss, or quorum instability. Transitioning from managed updates to unmanaged updates after deployment may require downtime. To avoid these disruptions, plan for unmanaged node updates from the start. See xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-updates[Kubernetes Cluster Requirements and Recommendations].

=== Bare-metal Kubernetes environments

Bare-metal Kubernetes environments give you complete control over both the control plane and the worker nodes, which can be advantageous when you want the following:
Expand All @@ -113,14 +115,15 @@ This documentation follows conventions to help users easily identify Kubernetes

== Next steps

Whether you're deploying locally or in the cloud, choose one of the following guides to get started:
- Get started
** xref:./local-guide.adoc[Local Deployment Guide] (kind and minikube)
** xref:./aks-guide.adoc[Azure Kubernetes Service Guide] (AKS)
** xref:./eks-guide.adoc[Elastic Kubernetes Service Guide] (EKS)
** xref:./gke-guide.adoc[Google Kubernetes Engine Guide] (GKE)

* xref:./local-guide.adoc[Local Deployment Guide] (kind and minikube)
* xref:./aks-guide.adoc[Azure Kubernetes Service Guide] (AKS)
* xref:./eks-guide.adoc[Elastic Kubernetes Service Guide] (EKS)
* xref:./gke-guide.adoc[Google Kubernetes Engine Guide] (GKE)
- xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc[Kubernetes Cluster Requirements and Recommendations]

Or, explore our xref:./k-production-workflow.adoc[production workflow] to learn about requirements and best practices.
- xref:./k-production-workflow.adoc[Production deployment workflow]

include::shared:partial$suggested-reading.adoc[]

Expand Down
49 changes: 40 additions & 9 deletions modules/deploy/partials/requirements.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,17 @@ https://helm.sh/docs/intro/install/[Install Helm^].
endif::[]

[[number-of-workers]]
== Number of {node}s
== Number of nodes

Provision one physical node or virtual machine (VM) for each Redpanda broker that you plan to deploy in your Redpanda cluster.
Each Redpanda broker requires its own dedicated {node} for the following reasons:
Each Redpanda broker requires its own dedicated node for the following reasons:

- *Resource isolation*: Redpanda brokers are designed to make full use of available system resources, including CPU and memory. By dedicating a {node} to each broker, you ensure that these resources aren't shared with other applications or processes, avoiding potential performance bottlenecks or contention.
- *External networking*: External clients should connect directly to the broker that owns the partition they're interested in. This means that each broker must be individually addressable. As clients must connect to the specific broker that is the leader of the partition, they need a mechanism to directly address each broker in the cluster. Assigning each broker to its own dedicated {node} makes this direct addressing feasible, since each {node} will have a unique address. See <<External networking>>.
- *Resource isolation*: Redpanda brokers are designed to make full use of available system resources, including CPU and memory. By dedicating a node to each broker, you ensure that these resources aren't shared with other applications or processes, avoiding potential performance bottlenecks or contention.
- *External networking*: External clients should connect directly to the broker that owns the partition they're interested in. This means that each broker must be individually addressable. As clients must connect to the specific broker that is the leader of the partition, they need a mechanism to directly address each broker in the cluster. Assigning each broker to its own dedicated node makes this direct addressing feasible, since each node will have a unique address. See <<External networking>>.
- *Fault tolerance*: Ensuring each broker operates on a separate node enhances fault tolerance. If one node experiences issues, it won't directly impact the other brokers.

ifdef::env-kubernetes[]
NOTE: The Redpanda Helm chart configures xref:reference:k-redpanda-helm-spec.adoc#statefulset-podantiaffinity[`podAntiAffinity` rules] to make sure that each Redpanda broker runs on its own {node}.
NOTE: The Redpanda Helm chart configures xref:reference:k-redpanda-helm-spec.adoc#statefulset-podantiaffinity[`podAntiAffinity` rules] to make sure that each Redpanda broker runs on its own node.


*Recommendations*: xref:./kubernetes-deploy.adoc#pod-replicas[Deploy at least three Pod replicas].
Expand All @@ -51,11 +51,42 @@ ifndef::env-kubernetes[]
*Recommendations*: Deploy at least three Redpanda brokers.
endif::[]

[[node-updates]]
== Node maintenance and operating system upgrades

Ensure that node and operating system (OS) upgrades are manually managed when running Redpanda in production. Manual control avoids unplanned reboots or replacements that disrupt Redpanda brokers, causing service downtime, data loss, or quorum instability.

=== Limitations of automatic updates

Redpanda is stateful. Redpanda brokers manage partition data and leadership, making them sensitive to disruptions. Proper handling during maintenance is required to:

- Avoid data loss, especially for nodes with ephemeral or local storage.
- Ensure smooth leadership transitions by decommissioning brokers before removing a node.
- Minimize service downtime by upgrading nodes one at a time during planned maintenance windows.

However, automatic update mechanisms provided by cloud platforms may not meet Redpanda's stateful requirements. Common issues include:

- Hard timeouts for graceful shutdowns that may not allow Redpanda brokers enough time to complete decommissioning or leadership transitions.
- Replacements or reboots without ensuring data has been safely migrated or replicated, risking data loss.
- Parallel upgrades across multiple nodes, which can disrupt quorum or reduce cluster availability.

*Recommendations*:

- Disable automatic node maintenance or upgrades.
ifdef::env-kubernetes[]
To prevent managed Kubernetes services from automatically rebooting or upgrading nodes:
** **Azure AKS**: Set the OS upgrade channel to `None`. https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image[Azure Documentation^].
** **Google GKE**: Disable GKE auto-upgrades for node pools. https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades[GCP Documentation^].
** **Amazon EKS**: Avoid enabling EKS node auto-upgrades. https://docs.aws.amazon.com/eks/latest/userguide/worker.html[AWS Documentation^].
- xref:upgrade:k-upgrade-kubernetes.adoc[Manually manage node upgrades].
endif::[]


== CPU and memory

*Requirements*:

- Two physical, not virtual, cores for each {node}.
- Two physical, not virtual, cores for each node.

- x86_64 (Westmere or newer) and AWS Graviton family processors are supported.

Expand All @@ -65,7 +96,7 @@ endif::[]

*Recommendations*:

- Four physical cores for each {node} are strongly recommended.
- Four physical cores for each node are strongly recommended.

ifdef::env-kubernetes[]
- xref:./kubernetes-deploy.adoc#resources[Set resource requests and limits for memory and CPU].
Expand Down Expand Up @@ -106,7 +137,7 @@ endif::[]

== External networking

- For external access, each {node} in your cluster must have a static, externally accessible IP address.
- For external access, each node in your cluster must have a static, externally accessible IP address.

- Minimum 10 GigE (10 Gigabit Ethernet) connection to ensure:

Expand All @@ -120,7 +151,7 @@ endif::[]

== Tuning

Before deploying Redpanda to production, each {node} that runs Redpanda must be tuned to optimize the Linux kernel for Redpanda processes.
Before deploying Redpanda to production, each node that runs Redpanda must be tuned to optimize the Linux kernel for Redpanda processes.

ifdef::env-kubernetes[]
See xref:deploy:deployment-option/self-hosted/kubernetes/k-tune-workers.adoc[].
Expand Down

0 comments on commit b264be4

Please sign in to comment.