bug: Too large resource version #9622

mfreeman451 · 2024-12-19T14:46:33Z

I first noticed this in calico-typha:

2024-12-19 02:10:33.661 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2024-12-19 02:10:33.855 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Timeout: Too large resource version: 12001236,
current: 60171
2024-12-19 02:10:33.855 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2024-12-19 02:10:33.860 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Timeout: Too large resource version: 12001216
, current: 60171
2024-12-19 02:10:33.861 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
2024-12-19 02:10:33.862 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies"
2024-12-19 02:10:33.865 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies" error=Timeout: Too large resource
 version: 12001215, current: 60171
2024-12-19 02:10:33.877 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"
2024-12-19 02:10:33.881 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices" error=Timeout: Too large resource
version: 12001215, current: 60171
2024-12-19 02:10:34.056 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Timeout: Too large resource version: 12001236,
current: 60171
2024-12-19 02:10:34.058 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice"
2024-12-19 02:10:34.061 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations" error=Timeout: Too large resource version
: 12001215, current: 60171
2024-12-19 02:10:34.061 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networkpolicies"

And then I looked at calico-kube-controllers:

2024-12-19 02:13:54.796 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/ipam/v2/assignment/" error=Timeout: Too large resource version: 11581173, current: 61236
2024-12-19 02:13:54.893 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2024-12-19 02:13:54.893 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
2024-12-19 02:13:54.898 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Timeout: Too large resource version: 12001215
, current: 61236
2024-12-19 02:13:54.898 [INFO][1] watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations" error=Timeout: Too large resource versi
on: 12001215, current: 61236

This was on a k3s cluster that recently had its internal etcd enabled (k3s used sqlite internally unless you pass a flag to k3s on startup and it will use an internal etcd). At the same time, I had installed a new and separate etcd instance, which I ended up completely removing after noticing other problems in my cluster, specifically calico couldn't route any packets I believe. After I removed the new etcd and restarted the calico processes, my cluster seemed to fully recover.

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Client Version: v3.29.1
Git commit: ddfc3b1
Cluster Version: v3.28.2
Cluster Type: typha,kdd,k8s,operator,bgp

k3s on ubuntu 22.02 nodes

The text was updated successfully, but these errors were encountered:

fasaxc · 2024-12-19T14:48:23Z

We should probably panic on Too large resource version obviously a very rare corner case, switching your backend etcd in/out, but we like to handle it somehow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Too large resource version #9622

bug: Too large resource version #9622

mfreeman451 commented Dec 19, 2024

fasaxc commented Dec 19, 2024

bug: Too large resource version #9622

bug: Too large resource version #9622

Comments

mfreeman451 commented Dec 19, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

fasaxc commented Dec 19, 2024