Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle objects from cache of type DeletedFinalStateUnknown #353

Conversation

nilsgstrabo
Copy link
Contributor

In rare situations, the object received from cache is of type DeletedFinalStateUnknown, which causes kubernetes-replicator to panic (see trace)

In this PR MustGetObject is modified to also check if obj is of type cache.DeletedFinalStateUnknown and calls itself with the Obj field, which should contain the real object.

I1012 05:51:47.296479       1 trace.go:236] Trace[2025117757]: "DeltaFIFO Pop Process" ID:myappns/mysecret,Depth:10184,Reason:slow event handlers blocking the queue (12-Oct-2024 05:51:46.898) (total time: 398ms):
Trace[2025117757]: [398.233324ms] [398.233324ms] END
E1012 05:51:47.296948       1 runtime.go:79] Observed a panic: Unknown type: cache.DeletedFinalStateUnknown (Unknown type: cache.DeletedFinalStateUnknown)
goroutine 7 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1209fa0, 0x40121d2eb8})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x7c
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x4000368e00?})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x78
panic({0x1209fa0?, 0x40121d2eb8?})
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/panic.go:770 +0x124
github.com/mittwald/kubernetes-replicator/replicate/common.MustGetObject({0x1245240, 0x40100e6800})
        /home/runner/work/kubernetes-replicator/kubernetes-replicator/replicate/common/strings.go:57 +0x10c
github.com/mittwald/kubernetes-replicator/replicate/common.MustGetKey({0x1245240?, 0x40100e6800?})
        /home/runner/work/kubernetes-replicator/kubernetes-replicator/replicate/common/strings.go:40 +0x2c
github.com/mittwald/kubernetes-replicator/replicate/common.(*GenericReplicator).ResourceDeleted(0x40001a10a0, {0x1245240, 0x40100e6800})

@nilsgstrabo
Copy link
Contributor Author

@martin-helmich Could you have a look at the changes in this PR?
The last release 2.10.2, which included the fix for concurrent map access, reduced number of crashes from 20 per day to 1 per week for us. The changes in this this PR should fix the remaining cause of crashes. I have tested it in one of our AKS clusters, and it works as expected.

Copy link
Member

@martin-helmich martin-helmich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. LGTM -- thanks for the fix 👍

@martin-helmich martin-helmich enabled auto-merge (squash) November 5, 2024 12:26
@martin-helmich
Copy link
Member

Still wondering about that failing test case. 🤔 On a quick glance, I'd think it's unrelated to the changes in this PR. I've rebased to current master to see if it wasn't a fluke.

@martin-helmich martin-helmich merged commit 0861231 into mittwald:master Nov 6, 2024
3 checks passed
@nilsgstrabo
Copy link
Contributor Author

Thanks 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants