-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce GetKeyValuesStretchCluster for Accurate Label Matching in StretchCluster #2965
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: OdedViner The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9f8e66d
to
2bf7271
Compare
api/v1/topologymap.go
Outdated
// GetKeyValues returns a node label matching the topologyKey and all values for StrechCluster | ||
// for that label across all storage nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment as per the working of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
api/v1/topologymap.go
Outdated
for label, labelValues := range m.Labels { | ||
if label == corev1.LabelZoneFailureDomainStable || label == labelZoneFailureDomainWithoutBeta { | ||
topologyKey = label | ||
values = labelValues | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means for stretch cluster we can't have a different failure domain? Like, rack
, host
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the GetKeysValues
function to do an equality check with valid labels that we define in the validTopologyLabelKeys
and return the key, values accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malayparida2000 @iamniting what do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @Nikhil-Ladha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stretch always uses zones, just to confirm in this thread in addition to comments from others that confirmed the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OdedViner Can you add some details on what's the problem with the existing code? And what's the exact issue. It's difficult to understand the motive for the change without knowing the root cause you are trying to fix.
api/v1/topologymap.go
Outdated
"strings" | ||
) | ||
|
||
const labelZoneFailureDomainWithoutBeta = "failure-domain.kubernetes.io/zone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this already defined somewhere in the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is defined in another package within the project, but it is a private variable (starts with a lowercase letter). Since it cannot be accessed outside its package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw our docs & went as far back as OCS 4.7. And I can confirm 2 things.
- Stretch cluster always uses failure domain zone
- For zone failure domain always
topology.kubernetes.io/zone
label is used. I don't see any need to search for any other labels such asfailure-domain.kubernetes.io/zone
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Malay on all these points. failure-domain.kubernetes.io/zone
is an old deprecated label for topology that was removed from K8s many releases ago, we should not use it anymore.
Root Cause: Impact: Proposed Solution: To address this issue, I introduced a new function, GetKeyValuesStrechCluster, which specifically checks for relevant and valid topology keys (corev1.LabelZoneFailureDomainStable and labelZoneFailureDomainWithoutBeta). This ensures only the intended labels are considered, preventing incorrect matches and maintaining the integrity of the stretch cluster configuration. Why This Fix Is Necessary: @malayparida2000 @Nikhil-Ladha @iamniting @sp98 |
|
Yes. It's always zone. |
In the whole of ODF we have only 3 supported kind of failure domains rack, hostname, and zone. |
99f9f7d
to
0796ed4
Compare
@OdedViner This fix selectively addresses the case of stretch cluster only, while the mis match of labels can still happen to a normal cluster if the customer uses an improperly formatted label. I would suggest fixing the issue properly for all cases. |
0796ed4
to
8f85424
Compare
03d786c
to
4d9e1bc
Compare
Signed-off-by: Oded Viner <[email protected]>
4d9e1bc
to
cb85115
Compare
This PR addresses FBUGS-160a and introduces the GetKeyValuesStretchCluster function to resolve an issue with the contains string method. The new function ensures a separation of logic without impacting the existing GetKeyValues function. However, this PR did not resolve the issue: #2841