Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix List() calls that didn't fill in revision #9599

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Dec 13, 2024

Description

The watchercache requires the KVList to contain the collection resource version for correctness. Without it, it can miss updates from the datastore.

  • Fix the resources that were missing the revision entirely.
  • Retire code in the Kubernetes datastore driver that would turn List() calls into Get() calls when only one resource was needed. This code was incorrect because Get() doesn't return the "collection resource version", which is what a List() call should return (and must return for a subsequent Watch() to function correctly).
  • Instead of Get() use a List() with a FieldSelector that matches on the name of the resource. This returns the same resource but it wraps it in a List object that carries the needed collection version.
  • As a special case, maintain the Get() logic for WorkloadEndpoint behind an opt-in Context flag (with a nice long name to make users of the hacky flag feel bad :-P). The CNI plugin relies on that logic to avoid needing list RBAC permissions for Pods.

Related issues/PRs

CORE-10904

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Fix that libcalico-go would not always fill in the revision when listing certain resources (or single instances of certain resources).  This could result in missed watch events in components such as Typha.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@fasaxc fasaxc added cherry-pick-candidate docs-not-required Docs not required for this change labels Dec 13, 2024
@fasaxc fasaxc requested a review from a team as a code owner December 13, 2024 18:04
@marvin-tigera marvin-tigera added this to the Calico v3.30.0 milestone Dec 13, 2024
@marvin-tigera marvin-tigera added the release-note-required Change has user-facing impact (no matter how small) label Dec 13, 2024
caseydavenport
caseydavenport previously approved these changes Dec 13, 2024
The watchercache requires the KVList to contain the revision
for correctness.  Without it, it can miss updates from the server.

Fix the resources that were missing the revision and panic if the
revision is missing to surface the problem in future.
Replace Gets() with lists that filter on name.
Looks like k8s might return that in some cases (customK8sResourceClient
checks for it).
CNI plugin relies on its List() being turned into a Get()
for a single Pod.  It doesn't have RBAC permissions for list
and it only uses List() because one pod can result in >1 WEP.
@fasaxc fasaxc dismissed caseydavenport’s stale review December 20, 2024 10:38

Big changes: refactored to use list with fieldSelector

// We know that, in KDD, even though there may be >1 endpoint, we're only
// looking up one backing Pod. Send it a hint that we really want it to
// do a Get instead of a list. CNI plugin only has RBAC permissions to do
// a get on Pod resources. The default used to be to do a Get if possible,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNI plugin only has RBAC permissions

Any reason to not just switch to a List() with the field selector specified, and give the CNI plugin the necessary permissions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly just thinking that it was quite a big loosening to give CNI plugin list permissions when it could make do with get before. That, and the fact that I'd need to adjust manifests and operator to loosen permissions, which felt like extra work to go in the wrong direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I just don't see a huge security distinction between "Get any pod in the cluster" vs "List any pod in the cluster"

But I'm happy to leave it as is.

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-candidate docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants