Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adjust rollout document. #616

Merged
merged 6 commits into from
Nov 15, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 19 additions & 27 deletions docs/howtos/crp.md
Original file line number Diff line number Diff line change
Expand Up @@ -368,31 +368,24 @@ These changes may trigger the following outcomes:
* Some clusters picked previously are now unpicked, and resources must be removed from such clusters
* Some clusters are newly picked, and resources must be added to them

Most of these outcomes may lead to service interruptions. Your apps running on the member clusters
may become unavailable temporarily, while Fleet sends out updated resources; clusters that are
now unpicked lose all the placed resources, and traffic sent to these clusters will be lost;
if there are too many newly picked clusters and Fleet places resources on them at the same time,
your backend may get overloaded. The exact pattern of interruption may vary,
depending on the set of resources you place using Fleet.

To help minimize the interruption, Fleet allows users to configure the rollout strategy,
similar to native Kubernetes deployment, to help transition between changes as smoothly as possible.
Currently, Fleet supports only one rollout strategy, rolling update. This strategy means changes,
including the addition or removal of picked clusters and resource refreshes,
will be applied in an incremental manner with a number of phases at a pace most appropriate for you.
This is the default option and will apply to all changes you initiated.
Most outcomes can lead to service interruptions. Apps running on member clusters may temporarily become
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most outcomes listed above can lead to service interruptions

unavailable as Fleet dispatches updated resources. Clusters that are no longer selected will lose all placed resources,
resulting in lost traffic. If too many new clusters are selected and Fleet places resources on them simultaneously,
your backend may become overloaded. The exact interruption pattern may vary depending on the resources you place using Fleet.

To minimize interruption, Fleet allows users to configure the rollout strategy, similar to native Kubernetes deployment,
to transition between changes as smoothly as possible. Currently, Fleet supports only one rollout strategy: rolling update.
This strategy ensures changes, including the addition or removal of selected clusters and resource refreshes,
are applied incrementally in a phased manner at a pace suitable for you. This is the default option and applies to all changes you initiate.

This rollout strategy can be configured with the following parameters:

* `maxUnavailable` controls that, for the selected set of resources, how many clusters may become
unavailable during a change. It can be set as an absolute number or a percentage. Default is 25%,
and you should not use zero for this value.
* `maxUnavailable` determines how many clusters may become unavailable during a change for the selected set of resources.
It can be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.

- The less value you set this parameter with, the less interruption you will experience during
a change; however, this would lead to slower rollouts.
- Setting this parameter to a lower value will result in less interruption during a change but will lead to slower rollouts.

- Note that Fleet considers a cluster as unavailable if resources have not been successfully
applied to the cluster.
- Fleet considers a cluster as unavailable if resources have not been successfully applied to the cluster.

- <details><summary>How Fleet interprets this value</summary>
Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable`
Expand All @@ -407,12 +400,11 @@ and you should not use zero for this value.

</details>

* `maxSurge` controls how many newly picked clusters will receive resource placements. It can
also be set as an absolute number or a percentage. Default is 25%, and you should not use zero for
this value.
* `maxSurge` determines the number of additional clusters, beyond the required number, that will receive resource placements.
It can also be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.

- The less value you set this parameter with, the less new resource placements Fleet will run
at the same time; however, this would lead to slower rollouts.
- Setting this parameter to a lower value for this parameter will result in fewer resource placements on additional
ryanzhang-oss marked this conversation as resolved.
Show resolved Hide resolved
clusters by Fleet, which may slow down the rollout process.

- <details><summary>How Fleet interprets this value</summary>
Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge`
Expand All @@ -426,12 +418,12 @@ this value.

</details>

* `unavailablePeriodSeconds` is introduced to allow users to inform the fleet when the resources are deemed "ready".
* `unavailablePeriodSeconds` allows users to inform the fleet when the resources are deemed "ready".
The default value is 60 seconds.

- Fleet only considers newly applied resources on a cluster as "ready" once `unavailablePeriodSeconds` seconds
have passed **after** the resources have been **successfully** applied to that cluster.
- The lower value you set for this parameter with, the faster rollout will occur. However, we **strongly**
- Setting a lower value for this parameter will result in faster rollouts. However, we **strongly**
recommend that users set it to a value that all the initialization/preparation tasks can be completed within
that time frame. This ensures that the resources are typically ready after the `unavailablePeriodSeconds` have passed.
- We are currently designing a generic "ready gate" for resources being applied to clusters. Please feel free to raise
Expand Down