address comments

Azure · May 22, 2024 · b1ca784 · b1ca784
1 parent 36dce95
commit b1ca784
Show file tree

Hide file tree

Showing 6 changed files with 26 additions and 19 deletions.
diff --git a/docs/troubleshooting/clusterResourcePlacementApplied.md b/docs/troubleshooting/clusterResourcePlacementApplied.md
@@ -1,9 +1,11 @@
 # How can I debug when my CRP ClusterResourcePlacementApplied condition is set to false?
+> Note: In addition, it may be helpful to look into the logs for the [apply work controller](https://github.com/Azure/fleet/blob/main/pkg/controllers/work/apply_controller.go) to get more information on why the resources are not available
 
 ### Common scenarios:
-- When the CRP is unable to propagate resources to a selected cluster due to the resource already existing on the cluster and not being managed by the fleet controller.
+- When the CRP is unable to propagate resources to a selected cluster due to the resource already existing on the cluster and not being managed by the fleet controller. 
+To remedy, CRP can `AllowCoOwnership` within `ApplyStrategy` to allow the resource to be managed by the fleet controller.
 - When the CRP is unable to propagate resource to selected due to another CRP already managing the resource for selected cluster with a different apply strategy.
-- When the CRP is unable to propagate resource due to failing to apply manifest due to syntax errors or invalid resource configurations.
+- When the CRP is unable to propagate resource due to failing to apply manifest due to syntax errors (which can happen when a resource is being propagated through an envelope object) or invalid resource configurations.
 
 ### Investigation steps:
 

diff --git a/docs/troubleshooting/clusterResourcePlacementAvailable.md b/docs/troubleshooting/clusterResourcePlacementAvailable.md
@@ -1,7 +1,7 @@
 # How can I debug when my CRP ClusterResourcePlacementAvailable condition is set to false?
-The ClusterResourcePlacementAvailable condition is false when the cluster lacks the necessary resources or capabilities to accommodate new deployments or allocations.
-> Note: In addition, it may be helpful to look into the logs for the apply work controller to get more information on why the resources are not available
->
+The ClusterResourcePlacementAvailable condition is false when some of the resources are not available yet. We will place some of the detailed failure in the `FailedResourcePlacement` array.
+> Note: In addition, it may be helpful to look into the logs for the [apply work controller](https://github.com/Azure/fleet/blob/main/pkg/controllers/work/apply_controller.go) to get more information on why the resources are not available
+
 ### Common scenarios:
 - When the CRP is unable to propagate resources to a selected cluster due the member cluster not having enough resource availability.
 - When the CRP is unable to propagate resource to a selected cluster due the deployment having a bad image name.
@@ -198,8 +198,9 @@ Looking at the status `Available` condition for `kind-cluster-1`, we see that th
 Therefore, there might be something wrong with the deployment manifest.
 
 #### Resolution:
-In this scenario, a viable solution is to rectify the deployment manifest by verifying that all fields are accurately specified.
-After fixing the resource manifest and updating it, it's essential to delete the CRP and then reapply or recreate it. This step ensures that the changes made to the resource manifest are properly reflected in the CRP configuration.
+In this scenario, a viable solution is to take a look at the deployment in the member cluster, as this may clearly indicate that the root cause of the issue is a bad image name. 
+Once the issue has been identified, you can then proceed to rectify the deployment manifest and update it accordingly. 
+After fixing the resource manifest and updating it, the CRP will automatically propagate the corrected resource to the member cluster.
 
 For all other scenarios, it's crucial to confirm that the propagated resource is configured correctly.
 Additionally, ensure that the selected cluster possesses sufficient available capacity to accommodate the new resources.
diff --git a/docs/troubleshooting/clusterResourcePlacementOverridden.md b/docs/troubleshooting/clusterResourcePlacementOverridden.md
@@ -1,7 +1,9 @@
 # How can I debug when my CRP status is ClusterResourcePlacementOverridden condition status is set to false?
 
 The status of the `ClusterResourcePlacementOverridden` condition is set to `false` when there is an Override API related issue.
-> Note: In addition, it may be helpful to look into the logs for the overrider controller to get more information on why the override did not succeed.
+> Note: In addition, it may be helpful to look into the logs for the overrider controller (includes 
+> controller for [ClusterResourceOverride](https://github.com/Azure/fleet/blob/main/pkg/controllers/overrider/clusterresource_controller.go) and 
+> [ResourceOverride](https://github.com/Azure/fleet/blob/main/pkg/controllers/overrider/resource_controller.go)) to get more information on why the override did not succeed.
 
 ## Common scenarios:
 
@@ -56,7 +58,7 @@ spec:
         value: new-value
 ```
 The `ClusterResourceOverride` is created to override the `ClusterRole` `secret-reader` by adding a new label `new-label`
-and value `new-value` for the clusters with the label `env: canary`.
+with a value `new-value` for the clusters with the label `env: canary`.
 
 ### CRP Spec:
 ```
@@ -142,8 +144,8 @@ The CRP attempted to override a propagated resource utilizing an applicable `Clu
 However, as the `ClusterResourcePlacementOverridden` condition remains false, looking at the placement status for the cluster
 where the condition `Overriden` failed will offer insights into the exact cause of the failure.
 The accompanying message highlights that the override failed due to the absence of the path `/metadata/labels/new-label` and its corresponding value.
-Based on the previous example of the cluster role `secret-reader`, it's evident that no labels were initially present.
-Therefore, the specified path for adding a new label is incorrect.
+Based on the previous example of the cluster role `secret-reader`, it's evident that the path `/metadata/labels` does not exist, meaning that `labels` does not exist. 
+Therefore, a new label cannot be added.
 
 ### Resolution:
 The solution here is to correct the path and value in the `ClusterResourceOverride` to successfully override the `ClusterRole` `secret-reader` as shown below:
@@ -154,3 +156,4 @@ jsonPatchOverrides:
     value: 
       newlabel: new-value
 ```
+This will successfully add the new label `newlabel` with the value `new-value` to the `ClusterRole` `secret-reader`, as we are creating the `labels` field and adding a new value `newlabel: new-value` to it.
diff --git a/docs/troubleshooting/clusterResourcePlacementRolloutStarted.md b/docs/troubleshooting/clusterResourcePlacementRolloutStarted.md
@@ -1,7 +1,7 @@
 # How can I debug when my CRP status is ClusterResourcePlacementRolloutStarted condition status is set to false?
 
 The `ClusterResourcePlacementRolloutStarted` condition status is set to `false` under the following circumstances: the selected resources have not been rolled out in all scheduled clusters yet.
-> Note: In addition, it may be helpful to look into the logs for the rollout controller to get more information on why the rollout did not start.
+> Note: In addition, it may be helpful to look into the logs for the [rollout controller](https://github.com/Azure/fleet/blob/main/pkg/controllers/rollout/controller.go) to get more information on why the rollout did not start.
 
 ## Common scenarios:
 

diff --git a/docs/troubleshooting/clusterResourcePlacementScheduled.md b/docs/troubleshooting/clusterResourcePlacementScheduled.md
@@ -1,6 +1,6 @@
 # How can I debug when my CRP status is ClusterResourcePlacementScheduled condition status is set to false?
 The `ClusterResourcePlacementScheduled` condition is set to `false` when the scheduler cannot find all the clusters needed as specified by the scheduling policy.
-> Note: In addition, it may be helpful to look into the logs for the scheduler controller to get more information on why the scheduling failed.
+> Note: In addition, it may be helpful to look into the logs for the [scheduler](https://github.com/Azure/fleet/blob/main/pkg/scheduler/scheduler.go) to get more information on why the scheduling failed.
 
 ### Common scenarios:
 
@@ -10,7 +10,7 @@ Instances where this condition may arise:
 - When the placement policy is set to `PickN`, and N clusters are specified, but there are fewer than N clusters that have joined the fleet or satisfy the placement policy.
 - When the CRP resource selector selects a reserved namespace.
 
->>Note: When the placement policy is set to `PickAll`, the `ClusterResourcePlacementScheduled` condition is always set to `true`.
+>Note: When the placement policy is set to `PickAll`, the `ClusterResourcePlacementScheduled` condition is always set to `true`.
 
 ### Example Scenario:
 

diff --git a/docs/troubleshooting/clusterResourcePlacementWorkSynchronized.md b/docs/troubleshooting/clusterResourcePlacementWorkSynchronized.md
@@ -1,7 +1,7 @@
 # How can I debug when my CRP status is ClusterResourcePlacementWorkSynchronized condition status is set to false?
 
 The `ClusterResourcePlacementWorkSynchronized` condition is false when the CRP has been recently updated but the associated work objects have not yet been synchronized with the changes.
-> Note: In addition, it may be helpful to look into the logs for the work generator controller to get more information on why the work synchronization failed.
+> Note: In addition, it may be helpful to look into the logs for the [work generator controller](https://github.com/Azure/fleet/blob/main/pkg/controllers/workgenerator/controller.go) to get more information on why the work synchronization failed.
 
 ## Common Scenarios:
 - If used, the `ClusterResourceOverride` or `ResourceOverride` is created with an invalid value for the resource.
@@ -108,7 +108,8 @@ that the work object `crp1-work` is prohibited from generating new content withi
 as it's currently undergoing termination.
 
 ### Resolution:
-- To address this specific issue, recreate the Custom Resource Placement (CRP) with a newly selected cluster.
-- Alternatively, delete the CRP and any work on the namespace, then wait for the namespace to regenerate
-
-In other scenarios, you might opt to wait for the work to finish propagating. If the issue persists, consider deleting the CRP and recreating it.
+To address the issue at hand, there are several potential solutions:
+- One option is to modify the Cluster Resource Placement (CRP) with a newly selected cluster. 
+- Another option is to delete the CRP to remove work through garbage collection.
+- It's also worth noting that the namespace can only regenerate if the cluster is re-joined, so another potential solution is to re-join the member cluster. 
+- In other scenarios, you might opt to wait for the work to finish propagating.