Estimated Duration: 60 minutes
Requires AKS cluster created in: basic AKS cluster.
You are given the following requirements:
- Initial testing should use a manual upgrade process
- The application pod count should never go below 1 during an upgrade
- Day 1 (simulated): Due to a critical OS level CVE you've been asked to upgrade node pool's NODE IMAGE ONLY
- Day 2 (simulated): Due to a critical Kubernetes level CVE you've been asked to upgrade the control plane and the node pool Kubernetes version to the next incremental version (major or minor)
- Day 3 (simulated): To take advantage of some new Kubernetes features you've been asked to upgrade the user pool Kubernetes version to the next incremental version (major or minor)
You were asked to complete the following tasks:
- Increase the deployment replica count to 2
- Deploy the necessary config to ensure the application pod count never dips below 1 pod
- Check the available upgrade versions for Kubernetes and Node Image
- Upgrade the system pool node image
- Upgrade the AKS control plane and node pool Kubernetes version
- Upgrade the node pool Kubernetes version
- Bonus Tasks: Enable Automatic Upgrades to the 'patch' channel and set a Planned Maintenance Window (preview) for Saturdays at 1am
In a terminal, export variables required for this lab (if not already exported):
INITIALS=abc
CLUSTER_NAME=aks-$INITIALS
RG=aks-$INITIALS-rg
If not already connected, connect to the cluster from your local client machine.
az aks get-credentials --name $CLUSTER_NAME -g $RG
If not already deployed, then proceed to deploy the aks-helloworld application.
-
Run the following commands to create a namespace and deploy hello world application:
kubectl create namespace helloworld kubectl apply -f manifests/aks-helloworld-basic.yaml -n helloworld
-
Run the following command to verify deployment and service has been created.
kubectl get all -n helloworld
-
Scale replica count to 2:
kubectl scale deployment aks-helloworld --replicas=2 -n helloworld
Kubernetes provides Pod Disruption Budgets as a mechanism to ensure a minimum pod count during Disruptions.
We need to ensure that there is minimum of one pod, by deploying the following pdb:
kubectl apply -f manifests/pdb.yaml -n helloworld
Confirm the pdb has been deployed:
kubectl get pdb -n helloworld
Get the node pool name:
NODEPOOL_NAME=$(az aks nodepool list -g $RG --cluster-name $CLUSTER_NAME --query '[].name' -o tsv)
Get the node pool image version
az aks nodepool show -g $RG --cluster-name $CLUSTER_NAME --nodepool-name $NODEPOOL_NAME -o tsv --query nodeImageVersion
AKSUbuntu-2204gen2containerd-202410.27.0
Get the latest node pool node image version
az aks nodepool get-upgrades -g $RG --cluster-name $CLUSTER_NAME --nodepool-name $NODEPOOL_NAME -o tsv --query latestNodeImageVersion
AKSUbuntu-2204gen2containerd-202411.12.0
In the above check you may have found that you're already running the latest node image version. If so, no action is needed. If not, you can upgrade as follows (this will take 5-10 minutes):
az aks nodepool upgrade --resource-group $RG --cluster-name $CLUSTER_NAME \
--name $NODEPOOL_NAME --node-image-only
On a separate terminal, you can check the nodes as they get upgraded (this will take a few minutes):
kubectl get nodes -o wide -w
Additionally it is recommended you check the events for errors.
kubectl events
You should see nodes getting drained and then upgraded
Normal Drain Node/aks-nodepool1-33528281-vmss000009 Draining node: aks-nodepool1-33528281-vmss000009
Normal Upgrade Node/aks-nodepool1-33528281-vmss000008 Successfully upgraded node: aks-nodepool1-33528281-vmss000008
Normal Upgrade Node/aks-nodepool1-33528281-vmss000008 Successfully reimaged node: aks-nodepool1-33528281-vmss000008
Normal NodeNotSchedulable Node/aks-nodepool1-33528281-vmss000009 Node aks-nodepool1-33528281-vmss000009 status is now: NodeNotSchedulable
Normal Upgrade Node/aks-nodepool1-33528281-vmss000009 Deleting node aks-nodepool1-33528281-vmss000009 from API server
Normal RemovingNode Node/aks-nodepool1-33528281-vmss000009 Node aks-nodepool1-33528281-vmss000009 event: Removing Node aks-nodepool1-33528281-vmss000009 from Controller
Once the nodes have been re-imaged check the image version to confirmed they have been upgraded to the latest image version:
az aks nodepool show -g $RG --cluster-name $CLUSTER_NAME --nodepool-name $NODEPOOL_NAME -o tsv --query nodeImageVersion
Check versions available:
az aks get-upgrades -g $RG -n $CLUSTER_NAME
During this lab creation, the versions available are:
"kubernetesVersion": "1.29.9",
"name": null,
"osType": "Linux",
"upgrades": [
{
"isPreview": null,
"kubernetesVersion": "1.30.5"
},
{
"isPreview": null,
"kubernetesVersion": "1.30.4"
},
{
"isPreview": null,
"kubernetesVersion": "1.30.3"
},
{
"isPreview": null,
"kubernetesVersion": "1.30.2"
},
{
"isPreview": null,
"kubernetesVersion": "1.30.1"
},
{
"isPreview": null,
"kubernetesVersion": "1.30.0"
}
To upgrade the kubernetes version, the command is the same as above without the --node-image-only tag. Upgrade the control plane to the next minor version available, e.g. from 1.29.9 to 1.30.0:
az aks upgrade -g $RG -n $CLUSTER_NAME \
--control-plane-only --kubernetes-version 1.30.0
Once completed check the control plane version (server version is the control plane version):
kubectl version
Client Version: v1.29.1
Server Version: v1.30.0
Start the node pool kubernetes version upgrade:
az aks nodepool upgrade --resource-group $RG --cluster-name $CLUSTER_NAME --name $NODEPOOL_NAME --kubernetes-version 1.30.0
Use the same commands to monitor progress on a separate window:
kubectl get nodes -o wide
kubectl events
You should see one node upgraded at a time:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nodepool1-33528281-vmss000008 Ready <none> 21m v1.29.9 10.224.0.6 <none> Ubuntu 22.04.5 LTS 5.15.0-1074-azure containerd://1.7.23-1
aks-nodepool1-33528281-vmss00000b Ready <none> 86s v1.30.0 10.224.0.5 <none> Ubuntu 22.04.5 LTS 5.15.0-1074-azure containerd://1.7.23-1
aks-nodepool1-33528281-vmss00000c NotReady <none> 94s v1.30.0 10.224.0.8 <none> Ubuntu 22.04.5 LTS 5.15.0-1074-azure containerd://1.7.23-1
In the above, notice all the single instance terminations and think about the impact that could have on your running application.
It would eventually make sense to increase all of their replica counts and implement PodDisruptionBudgets
for those as well
but that's out of scope for this exercise.
While the above process is actually pretty easy and could be automated through a ticketing system without much effort, for lower environments (dev/test) you may consider letting AKS manage the version upgrades for you. You can also specify the upgrade window.
Check the current auto upgrade profile (output should be None)
az aks show -g $RG -n $CLUSTER_NAME -o tsv --query autoUpgradeProfile
NodeImage None
Enable auto upgrades to the 'patch' channel
az aks update --resource-group $RG --name $CLUSTER_NAME --auto-upgrade-channel patch
Check the auto upgrade profile again
az aks show -g $RG -n $CLUSTER_NAME -o tsv --query autoUpgradeProfile
patch
Create the planned maintenance configuration for Saturdays at 1am
az aks maintenanceconfiguration add -g $RG --cluster-name $CLUSTER_NAME \
--name default --weekday Saturday --start-hour 1
Show the current maintenance window configuration
az aks maintenanceconfiguration show -g $RG --cluster-name $CLUSTER_NAME \
--name default -o yaml
name: default
notAllowedTime: null
systemData: null
timeInWeek:
- day: Saturday
hourSlots:
- 1
type: null
If you are done with all labs, then clean up by deleting the resource group, otherwise keep the resource group to save time in subsequent labs.
az group delete --name $RG --yes --no-wait