Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Node Label Matching for Supported Topology Keys in NodeTopologyMap #2965

Merged

Conversation

OdedViner
Copy link
Contributor

@OdedViner OdedViner commented Jan 12, 2025

The OCS operator used strings. Contains to match topology keys, leading to incorrect matches (e.g., topology.kubernetes.io/zone-principal instead of topology.kubernetes.io/zone). This caused invalid updates to nodeTopologies in the StorageCluster spec, breaking stretch cluster configuration and resulting in Rook operator failure.

Fix:
Updated the GetKeyValues method to explicitly match supported topology keys (rack, hostname, zone) using a predefined map. Unsupported or unmatched keys now return empty results.

Impact:

  • Prevents unintended matches.
  • Ensures correct StorageCluster and CephCluster configurations.
  • Improves topology handling reliability.

Comment on lines 83 to 82
// GetKeyValues returns a node label matching the topologyKey and all values for StrechCluster
// for that label across all storage nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment as per the working of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for label, labelValues := range m.Labels {
if label == corev1.LabelZoneFailureDomainStable || label == labelZoneFailureDomainWithoutBeta {
topologyKey = label
values = labelValues
break
}
}
Copy link
Member

@Nikhil-Ladha Nikhil-Ladha Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means for stretch cluster we can't have a different failure domain? Like, rack, host?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the GetKeysValues function to do an equality check with valid labels that we define in the validTopologyLabelKeys and return the key, values accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malayparida2000 @iamniting what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @Nikhil-Ladha

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stretch always uses zones, just to confirm in this thread in addition to comments from others that confirmed the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@malayparida2000 malayparida2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OdedViner Can you add some details on what's the problem with the existing code? And what's the exact issue. It's difficult to understand the motive for the change without knowing the root cause you are trying to fix.

"strings"
)

const labelZoneFailureDomainWithoutBeta = "failure-domain.kubernetes.io/zone"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this already defined somewhere in the codebase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is defined in another package within the project, but it is a private variable (starts with a lowercase letter). Since it cannot be accessed outside its package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw our docs & went as far back as OCS 4.7. And I can confirm 2 things.

  1. Stretch cluster always uses failure domain zone
  2. For zone failure domain always topology.kubernetes.io/zone label is used. I don't see any need to search for any other labels such as failure-domain.kubernetes.io/zone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Malay on all these points. failure-domain.kubernetes.io/zone is an old deprecated label for topology that was removed from K8s many releases ago, we should not use it anymore.

@OdedViner
Copy link
Contributor Author

@OdedViner Can you add some details on what's the problem with the existing code? And what's the exact issue. It's difficult to understand the motive for the change without knowing the root cause you are trying to fix.

Root Cause:
When incorrect labels are matched, it updates the NodeTopologies in the StorageCluster spec with incorrect zone information. This directly impacts the stretch cluster settings in the CephCluster spec by adding invalid zone information, causing the Rook operator to fail.

Impact:
1.The storage cluster misidentifies node topology information, leading to incorrect zone configuration.
2.In the stretch cluster setup, this results in the wrong zone labels being applied, disrupting the intended fault domain configuration.
3.Ultimately, the Rook operator fails to reconcile the CephCluster, breaking the deployment process.

Proposed Solution: To address this issue, I introduced a new function, GetKeyValuesStrechCluster, which specifically checks for relevant and valid topology keys (corev1.LabelZoneFailureDomainStable and labelZoneFailureDomainWithoutBeta). This ensures only the intended labels are considered, preventing incorrect matches and maintaining the integrity of the stretch cluster configuration.
The function isolates the logic required for stretch cluster scenarios, leaving the existing GetKeyValues behavior unchanged for other use cases. This makes the fix targeted and minimizes the risk of regressions elsewhere.

Why This Fix Is Necessary:
The root cause of the issue lies in the overly broad match logic (strings.Contains), which needs to be refined for stretch cluster use cases.
Without this fix, incorrect labels (e.g., topology.kubernetes.io/zone-principal) will continue to disrupt the stretch cluster settings.
The new function ensures correctness by limiting the keys checked to those explicitly relevant for stretch cluster configurations.
Let me know if more clarification or details are needed!

@malayparida2000 @Nikhil-Ladha @iamniting @sp98
Is the stretch cluster failure domain always set to "zone"?

@sp98
Copy link
Contributor

sp98 commented Jan 13, 2025

Yes. It's always zone.

@malayparida2000
Copy link
Contributor

malayparida2000 commented Jan 13, 2025

In the whole of ODF we have only 3 supported kind of failure domains rack, hostname, and zone.
rack- topology.rook.io/rack
hostname/host- kubernetes.io/hostname
zone- topology.kubernetes.io/zone
So I think we can just refactor the code base & stop caring about any other labels in here. This would help us eliminate the contains logic. And we can just directly match the label key with these 3 values of labels.

@OdedViner OdedViner force-pushed the strech_cluster_zones branch 3 times, most recently from 99f9f7d to 0796ed4 Compare January 14, 2025 17:14
@malayparida2000
Copy link
Contributor

@OdedViner This fix selectively addresses the case of stretch cluster only, while the mis match of labels can still happen to a normal cluster if the customer uses an improperly formatted label. I would suggest fixing the issue properly for all cases.
Ref-#2965 (comment)
We should refactor the parts like how topologymap is built, failure domain is determined, keyvalues are fetched so that we finally fix this issue as a whole.

@OdedViner OdedViner force-pushed the strech_cluster_zones branch from 0796ed4 to 8f85424 Compare January 15, 2025 15:09
@OdedViner OdedViner changed the title Adjust Node Label Conditions Based on Full Label Name on strechcluster Introduce GetKeyValuesStretchCluster for Accurate Label Matching in StretchCluster Jan 16, 2025
@OdedViner OdedViner force-pushed the strech_cluster_zones branch 10 times, most recently from cb85115 to 789e697 Compare January 18, 2025 19:00
@OdedViner OdedViner changed the title Introduce GetKeyValuesStretchCluster for Accurate Label Matching in StretchCluster Restrict Node Label Matching for Supported Topology Keys in NodeTopologyMap Jan 19, 2025
Restrict topology key matching to explicitly supported labels
to prevent incorrect updates and ensure reliable cluster
configuration.

Signed-off-by: Oded Viner <[email protected]>
@OdedViner OdedViner force-pushed the strech_cluster_zones branch from 789e697 to d60ecd7 Compare January 19, 2025 09:12
@malayparida2000
Copy link
Contributor

The fix looks good now, @OdedViner Can you please test this with a cluster & let us know the results here.

@OdedViner
Copy link
Contributor Author

The fix looks good now, @OdedViner Can you please test this with a cluster & let us know the results here.

The private image seems to have resolved the issue

Procedure:
1.Deploy arbiter cluster:
ocs-ci path:
deployment/vsphere/upi_1az_rhcos_vsan_lso_vmdk_3m_6w_arbiter.yaml [4.18.0-109.stable image]

2.Add "topology.kubernetes.io/zone-principal=true" label to all worker nodes

oc label nodes compute-0 topology.kubernetes.io/zone-principal=true
oc label nodes compute-1 topology.kubernetes.io/zone-principal=true
oc label nodes compute-2 topology.kubernetes.io/zone-principal=true
oc label nodes compute-3 topology.kubernetes.io/zone-principal=true
oc label nodes compute-4 topology.kubernetes.io/zone-principal=true
oc label nodes compute-5 topology.kubernetes.io/zone-principal=true
  1. Reset ocs-operaotr pod

4.Check Storagecluster status:

Message:                  Error while reconciling: CephCluster.ceph.rook.io "ocs-storagecluster-cephcluster" is invalid: spec.mon: Invalid value: "object": stretchCluster zones must be equal to 3

Failure Domain Key: topology.kubernetes.io/zone-principal

$ oc describe storagecluster
Status:
  Conditions:
    Last Heartbeat Time:      2025-01-21T11:54:01Z
    Last Transition Time:     2025-01-21T11:54:01Z
    Message:                  Version check successful
    Reason:                   VersionMatched
    Status:                   False
    Type:                     VersionMismatch
    Last Heartbeat Time:      2025-01-21T16:04:41Z
    Last Transition Time:     2025-01-21T15:29:45Z
    Message:                  Error while reconciling: CephCluster.ceph.rook.io "ocs-storagecluster-cephcluster" is invalid: spec.mon: Invalid value: "object": stretchCluster zones must be equal to 3
    Reason:                   ReconcileFailed
    Status:                   False
    Type:                     ReconcileComplete
    Last Heartbeat Time:      2025-01-21T15:29:06Z
    Last Transition Time:     2025-01-21T15:09:32Z
    Message:                  Reconcile completed successfully
    Reason:                   ReconcileCompleted
    Status:                   True
    Type:                     Available
    Last Heartbeat Time:      2025-01-21T15:29:06Z
    Last Transition Time:     2025-01-21T15:19:39Z
    Message:                  Reconcile completed successfully
    Reason:                   ReconcileCompleted
    Status:                   False
    Type:                     Progressing
    Last Heartbeat Time:      2025-01-21T15:29:06Z
    Last Transition Time:     2025-01-21T15:09:32Z
    Message:                  Reconcile completed successfully
    Reason:                   ReconcileCompleted
    Status:                   False
    Type:                     Degraded
    Last Heartbeat Time:      2025-01-21T15:29:06Z
    Last Transition Time:     2025-01-21T15:19:39Z
    Message:                  Reconcile completed successfully
    Reason:                   ReconcileCompleted
    Status:                   True
    Type:                     Upgradeable
  Current Mon Count:          5
  Default Ceph Device Class:  ssd
  Failure Domain:             zone
  Failure Domain Key:         topology.kubernetes.io/zone-principal

5.Create private image based relase-4.18 branch

$ export REGISTRY_NAMESPACE=oviner
$ export IMAGE_TAG=nitin-test
$ make ocs-operator
$ podman push quay.io/$REGISTRY_NAMESPACE/ocs-operator:$IMAGE_TAG

image location:
quay.io/oviner/ocs-operator:nitin-test

7.Change csv image

$ oc edit csv ocs-operator.v4.18.0-109.stable

                image: quay.io/oviner/ocs-operator:nitin-test
                imagePullPolicy: Always
                name: ocs-operator
  1. Manual remove some entries from the StorageCluster CR status.
oc patch storagecluster ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[

{ "op": "remove", "path": "/status/failureDomain" }
]'
oc patch storagecluster ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[

{ "op": "remove", "path": "/status/failureDomainValues"}
]'
oc patch storagecluster ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[

{ "op": "remove", "path": "/status/failureDomainKey" }
]'
oc patch storagecluster ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[

{ "op": "remove", "path": "/status/nodeTopologies" }
]'
  1. Reset ocs-operaotr pod
$ oc delete pod ocs-operator-74c9fd8477-kp88x 
pod "ocs-operator-74c9fd8477-kp88x" deleted

10.Check storagecluster status

Current Mon Count:          5
  Default Ceph Device Class:  ssd
  Failure Domain:             zone
  Failure Domain Key:         topology.kubernetes.io/zone
  Failure Domain Values:
    data-1
    data-2
  Images:
    Ceph:
      Actual Image:   registry.redhat.io/rhceph/rhceph-8-rhel9@sha256:665278442f848fe2ac12be483b8778e672f35325daf6b90977ac12c3165bcabc
      Desired Image:  registry.redhat.io/rhceph/rhceph-8-rhel9@sha256:665278442f848fe2ac12be483b8778e672f35325daf6b90977ac12c3165bcabc
    Noobaa Core:
      Actual Image:   registry.redhat.io/odf4/mcg-core-rhel9@sha256:35f7f2c455823b04f77f664d0acf0a5759d6c6601e9ae0ad11bb00b97ed8cf1c
      Desired Image:  registry.redhat.io/odf4/mcg-core-rhel9@sha256:35f7f2c455823b04f77f664d0acf0a5759d6c6601e9ae0ad11bb00b97ed8cf1c
    Noobaa DB:
      Actual Image:   registry.redhat.io/rhel9/postgresql-15@sha256:44a08b83a6c50714b52f4cf1c3476bc16b66faec21dd9a9bc07d1be5f97b8150
      Desired Image:  registry.redhat.io/rhel9/postgresql-15@sha256:44a08b83a6c50714b52f4cf1c3476bc16b66faec21dd9a9bc07d1be5f97b8150
  Kms Server Connection:
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        compute-0
        compute-1
        compute-2
        compute-3
        compute-4
        compute-5
      topology.kubernetes.io/zone:
        data-1
        data-2
      topology.kubernetes.io/zone-principal:
        true
  Phase:  Ready
$ oc describe csv ocs-operator.v4.18.0-109.stable | grep -i oviner -C 3
                  Value From:
                    Field Ref:
                      Field Path:   metadata.namespace
                Image:              quay.io/oviner/ocs-operator:nitin-test
                Image Pull Policy:  Always
                Name:               ocs-operator
                Readiness Probe:

Copy link
Contributor

@malayparida2000 malayparida2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2025
Copy link
Contributor

openshift-ci bot commented Jan 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamniting, malayparida2000, OdedViner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2025
@malayparida2000
Copy link
Contributor

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 1085e40 into red-hat-storage:main Jan 24, 2025
11 checks passed
@malayparida2000
Copy link
Contributor

/cherry-pick release-4.18

@openshift-cherrypick-robot

@malayparida2000: new pull request created: #2993

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants