From bfd82c93e8b0807a8b88979073475b10820653ed Mon Sep 17 00:00:00 2001 From: Ryan Zhang Date: Wed, 15 Nov 2023 14:30:38 -0800 Subject: [PATCH] fix: adjust rollout document. (#616) Co-authored-by: Ryan Zhang Co-authored-by: Britania Rodriguez Reyes <145056127+britaniar@users.noreply.github.com> --- docs/howtos/crp.md | 113 +++++++++++++++++++++------------------------ 1 file changed, 53 insertions(+), 60 deletions(-) diff --git a/docs/howtos/crp.md b/docs/howtos/crp.md index 2730d290f..6ce5384dd 100644 --- a/docs/howtos/crp.md +++ b/docs/howtos/crp.md @@ -358,83 +358,76 @@ After a `ClusterResourcePlacement` is created, you may want to * Add, update, or remove the resources that have been selected by the `ClusterResourcePlacement` in the hub cluster -* Add, update, or remove resource selectors in the `ClusterResourcePlacement` -* Add or update the scheduling policy in the `ClusterResourcePlacement` +* Update the resource selectors in the `ClusterResourcePlacement` +* Update the scheduling policy in the `ClusterResourcePlacement` These changes may trigger the following outcomes: -* New resources may need to placed on all picked clusters +* New resources may need to be placed on all picked clusters * Resources already placed on a pick cluster may get updated or deleted * Some clusters picked previously are now unpicked, and resources must be removed from such clusters * Some clusters are newly picked, and resources must be added to them -Most of these outcomes may lead to service interruptions. Your apps running on the member clusters -may become unavailable temporarily, while Fleet sends out updated resources; clusters that are -now unpicked lose all the placed resources, and traffic sent to these clusters will be lost; -if there are too many newly picked clusters and Fleet places resources on them at the same time, -your backend may get overloaded. The exact pattern of interruption may vary, -depending on the set of resources you place using Fleet. +Most outcomes can lead to service interruptions. Apps running on member clusters may temporarily become +unavailable as Fleet dispatches updated resources. Clusters that are no longer selected will lose all placed resources, +resulting in lost traffic. If too many new clusters are selected and Fleet places resources on them simultaneously, +your backend may become overloaded. The exact interruption pattern may vary depending on the resources you place using Fleet. -To help minimize the interruption, Fleet provides rollout strategy configuration to help you -transition between changes as smoothly as possible. Currently, Fleet supports only one rollout -strategy, rolling update; with this strategy, Fleet will apply changes, including the addition or -removal of picked clusters and resource refreshes, in an incremental manner with a number of phaes -at a pace most appropriate for you. This is the default option and will apply to all -changes you initiated. +To minimize interruption, Fleet allows users to configure the rollout strategy, similar to native Kubernetes deployment, +to transition between changes as smoothly as possible. Currently, Fleet supports only one rollout strategy: rolling update. +This strategy ensures changes, including the addition or removal of selected clusters and resource refreshes, +are applied incrementally in a phased manner at a pace suitable for you. This is the default option and applies to all changes you initiate. This rollout strategy can be configured with the following parameters: -* `maxUnavailable` controls that, for the selected set of resources, how many clusters may become -unavailable during a change. It can be set as an absolute number or a percentage. Default is 25%, -and you should not use zero for this value. +* `maxUnavailable` determines how many clusters may become unavailable during a change for the selected set of resources. +It can be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value. + + - Setting this parameter to a lower value will result in less interruption during a change but will lead to slower rollouts. - **The less value you set this parameter with, the less interruption you will experience during - a change**; however, this would lead to slower rollouts. + - Fleet considers a cluster as unavailable if resources have not been successfully applied to the cluster. - Note that Fleet considers a cluster as unavailable if resources have not been successfully - applied to the cluster. - -
How Fleet interprets this value -

- - Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable` - number of clusters available, where N is: - - * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given; - * for scheduling policies of the `PickFixed` placement type, the number of cluster names given; - * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks. - - If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as - well. - -
- -* `maxSurge` controls how many newly picked clusters will receive resource placements. It can -also be set as an absolute number or a percentage. Default is 25%, and you should not use zero for -this value. - - **The less value you set this parameter with, the less new resource placements Fleet will run - at the same time**; however, this would lead to slower rollouts. - -
How Fleet interprets this value -

- - Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge` - number of clusters available, where N is: - - * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given; - * for scheduling policies of the `PickFixed` placement type, the number of cluster names given; - * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks. + -
How Fleet interprets this value + Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable` + number of clusters available, where N is: + + * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given; + * for scheduling policies of the `PickFixed` placement type, the number of cluster names given; + * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks. + + If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as + well. + +
- If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as - well. +* `maxSurge` determines the number of additional clusters, beyond the required number, that will receive resource placements. +It can also be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value. -
+ - Setting this parameter to a lower value will result in fewer resource placements on additional + clusters by Fleet, which may slow down the rollout process. -* `unavailablePeriodSeconds` controls the frequeny of rollout phases. Default is 60 seconds. + -
How Fleet interprets this value + Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge` + number of clusters available, where N is: - **The less value you set this parameter with, the quicker rollout will become**. However, using - a value that is too little may lead to unexpected service interruptions. + * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given; + * for scheduling policies of the `PickFixed` placement type, the number of cluster names given; + * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks. + + If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as well. + +
+ +* `unavailablePeriodSeconds` allows users to inform the fleet when the resources are deemed "ready". + The default value is 60 seconds. + + - Fleet only considers newly applied resources on a cluster as "ready" once `unavailablePeriodSeconds` seconds + have passed **after** the resources have been **successfully** applied to that cluster. + - Setting a lower value for this parameter will result in faster rollouts. However, we **strongly** + recommend that users set it to a value that all the initialization/preparation tasks can be completed within + that time frame. This ensures that the resources are typically ready after the `unavailablePeriodSeconds` have passed. + - We are currently designing a generic "ready gate" for resources being applied to clusters. Please feel free to raise + issues or provide feedback if you have any thoughts on this. > Note >