Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Too large resource version #9622

Open
mfreeman451 opened this issue Dec 19, 2024 · 1 comment
Open

bug: Too large resource version #9622

mfreeman451 opened this issue Dec 19, 2024 · 1 comment

Comments

@mfreeman451
Copy link

I first noticed this in calico-typha:

2024-12-19 02:10:33.661 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2024-12-19 02:10:33.855 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Timeout: Too large resource version: 12001236,
current: 60171
2024-12-19 02:10:33.855 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2024-12-19 02:10:33.860 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Timeout: Too large resource version: 12001216
, current: 60171
2024-12-19 02:10:33.861 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
2024-12-19 02:10:33.862 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies"
2024-12-19 02:10:33.865 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies" error=Timeout: Too large resource
 version: 12001215, current: 60171
2024-12-19 02:10:33.877 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"
2024-12-19 02:10:33.881 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices" error=Timeout: Too large resource
version: 12001215, current: 60171
2024-12-19 02:10:34.056 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Timeout: Too large resource version: 12001236,
current: 60171
2024-12-19 02:10:34.058 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice"
2024-12-19 02:10:34.061 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations" error=Timeout: Too large resource version
: 12001215, current: 60171
2024-12-19 02:10:34.061 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networkpolicies"

And then I looked at calico-kube-controllers:

2024-12-19 02:13:54.796 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/ipam/v2/assignment/" error=Timeout: Too large resource version: 11581173, current: 61236
2024-12-19 02:13:54.893 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2024-12-19 02:13:54.893 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
2024-12-19 02:13:54.898 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Timeout: Too large resource version: 12001215
, current: 61236
2024-12-19 02:13:54.898 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations" error=Timeout: Too large resource versi
on: 12001215, current: 61236

This was on a k3s cluster that recently had its internal etcd enabled (k3s used sqlite internally unless you pass a flag to k3s on startup and it will use an internal etcd). At the same time, I had installed a new and separate etcd instance, which I ended up completely removing after noticing other problems in my cluster, specifically calico couldn't route any packets I believe. After I removed the new etcd and restarted the calico processes, my cluster seemed to fully recover.

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Client Version: v3.29.1
Git commit: ddfc3b1
Cluster Version: v3.28.2
Cluster Type: typha,kdd,k8s,operator,bgp

k3s on ubuntu 22.02 nodes

@fasaxc
Copy link
Member

fasaxc commented Dec 19, 2024

We should probably panic on Too large resource version obviously a very rare corner case, switching your backend etcd in/out, but we like to handle it somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants