fix: adjust rollout document. (#616)

Co-authored-by: Ryan Zhang <[email protected]> Co-authored-by: Britania Rodriguez Reyes <[email protected]>
Azure · Nov 15, 2023 · bfd82c9 · bfd82c9
1 parent 59c8f07
commit bfd82c9
Showing 1 changed file with 53 additions and 60 deletions.
diff --git a/docs/howtos/crp.md b/docs/howtos/crp.md
@@ -358,83 +358,76 @@ After a `ClusterResourcePlacement` is created, you may want to
 
 * Add, update, or remove the resources that have been selected by the
 `ClusterResourcePlacement` in the hub cluster
-* Add, update, or remove resource selectors in the `ClusterResourcePlacement`
-* Add or update the scheduling policy in the `ClusterResourcePlacement`
+* Update the resource selectors in the `ClusterResourcePlacement`
+* Update the scheduling policy in the `ClusterResourcePlacement`
 
 These changes may trigger the following outcomes:
 
-* New resources may need to placed on all picked clusters
+* New resources may need to be placed on all picked clusters
 * Resources already placed on a pick cluster may get updated or deleted
 * Some clusters picked previously are now unpicked, and resources must be removed from such clusters
 * Some clusters are newly picked, and resources must be added to them
 
-Most of these outcomes may lead to service interruptions. Your apps running on the member clusters
-may become unavailable temporarily, while Fleet sends out updated resources; clusters that are 
-now unpicked lose all the placed resources, and traffic sent to these clusters will be lost;
-if there are too many newly picked clusters and Fleet places resources on them at the same time,
-your backend may get overloaded. The exact pattern of interruption may vary, 
-depending on the set of resources you place using Fleet.
+Most outcomes can lead to service interruptions. Apps running on member clusters may temporarily become 
+unavailable as Fleet dispatches updated resources. Clusters that are no longer selected will lose all placed resources,
+resulting in lost traffic. If too many new clusters are selected and Fleet places resources on them simultaneously, 
+your backend may become overloaded. The exact interruption pattern may vary depending on the resources you place using Fleet.
 
-To help minimize the interruption, Fleet provides rollout strategy configuration to help you
-transition between changes as smoothly as possible. Currently, Fleet supports only one rollout
-strategy, rolling update; with this strategy, Fleet will apply changes, including the addition or
-removal of picked clusters and resource refreshes, in an incremental manner with a number of phaes
-at a pace most appropriate for you. This is the default option and will apply to all
-changes you initiated.
+To minimize interruption, Fleet allows users to configure the rollout strategy, similar to native Kubernetes deployment,
+to transition between changes as smoothly as possible. Currently, Fleet supports only one rollout strategy: rolling update.
+This strategy ensures changes, including the addition or removal of selected clusters and resource refreshes, 
+are applied incrementally in a phased manner at a pace suitable for you. This is the default option and applies to all changes you initiate.
 
 This rollout strategy can be configured with the following parameters:
 
-* `maxUnavailable` controls that, for the selected set of resources, how many clusters may become
-unavailable during a change. It can be set as an absolute number or a percentage. Default is 25%,
-and you should not use zero for this value.
+* `maxUnavailable` determines how many clusters may become unavailable during a change for the selected set of resources. 
+It can be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.
+
+    - Setting this parameter to a lower value will result in less interruption during a change but will lead to slower rollouts.
 
-    **The less value you set this parameter with, the less interruption you will experience during
-    a change**; however, this would lead to slower rollouts.
+    - Fleet considers a cluster as unavailable if resources have not been successfully applied to the cluster.
 
-    Note that Fleet considers a cluster as unavailable if resources have not been successfully
-    applied to the cluster.
-
-    <details><summary>How Fleet interprets this value</summary>
-    <p></p>
-
-    Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable`
-    number of clusters available, where N is:
-
-    * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given;
-    * for scheduling policies of the `PickFixed` placement type, the number of cluster names given;
-    * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks.
-
-    If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as
-    well.
-
-    </details>
-
-* `maxSurge` controls how many newly picked clusters will receive resource placements. It can
-also be set as an absolute number or a percentage. Default is 25%, and you should not use zero for
-this value.
-
-    **The less value you set this parameter with, the less new resource placements Fleet will run
-    at the same time**; however, this would lead to slower rollouts.
-
-    <details><summary>How Fleet interprets this value</summary>
-    <p></p>
-
-    Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge`
-    number of clusters available, where N is:
-
-    * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given;
-    * for scheduling policies of the `PickFixed` placement type, the number of cluster names given;
-    * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks.
+    - <details><summary>How Fleet interprets this value</summary>
+      Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable`
+      number of clusters available, where N is:
+
+      * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given;
+      * for scheduling policies of the `PickFixed` placement type, the number of cluster names given;
+      * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks.
+
+      If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as
+      well.
+
+      </details>
 
-    If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as
-    well.
+* `maxSurge` determines the number of additional clusters, beyond the required number, that will receive resource placements.
+It can also be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.
 
-    </details>
+    - Setting this parameter to a lower value will result in fewer resource placements on additional 
+        clusters by Fleet, which may slow down the rollout process.
 
-* `unavailablePeriodSeconds` controls the frequeny of rollout phases. Default is 60 seconds.
+    -  <details><summary>How Fleet interprets this value</summary>
+        Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge`
+             number of clusters available, where N is:
 
-    **The less value you set this parameter with, the quicker rollout will become**. However, using
-    a value that is too little may lead to unexpected service interruptions.
+        * for scheduling policies of the `PickN` placement type, the `numberOfClusters` value given;
+        * for scheduling policies of the `PickFixed` placement type, the number of cluster names given;
+        * for scheduling policies of the `PickAll` placement type, the number of clusters Fleet picks.
+
+        If you use a percentage for the `maxUnavailable` parameter, it is calculated against N as well.
+
+        </details>
+
+* `unavailablePeriodSeconds` allows users to inform the fleet when the resources are deemed "ready".
+     The default value is 60 seconds.
+
+    - Fleet only considers newly applied resources on a cluster as "ready" once `unavailablePeriodSeconds` seconds 
+       have passed **after** the resources have been **successfully** applied to that cluster.
+    - Setting a lower value for this parameter will result in faster rollouts. However, we **strongly** 
+       recommend that users set it to a value that all the initialization/preparation tasks can be completed within
+       that time frame. This ensures that the resources are typically ready after the `unavailablePeriodSeconds` have passed.
+    - We are currently designing a generic "ready gate" for resources being applied to clusters. Please feel free to raise 
+       issues or provide feedback if you have any thoughts on this.
 
 > Note
 >