-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi-node-driver-registrar consumed ~44% cpu time to read container registry TimeZone information in Windows #229
Comments
The csi node driver registrar primarily executes when the CSI node plugin is initializing and registering with Kubelet. Did you observe the above happening beyond the CSI node daemonset pod's initialization? Were there any interesting bits (especially errors/retries) in the logs of the csi node driver registrar container from the CSI node plugin pod? |
It's within the bounds of Pod initialization. I am curious why this process needs to read time zone registry keys. This operation seems take a quite bit CPU time. Thank you for the quick response. |
This is quite unexpected in the first place (assuming Pod initialization above is referring to a general stateful workload pod that mounts PVs backed by the CSI plugin) as the driver registrar does not have a role to play beyond CSI Node registration. The logs may reveal if you have a situation where the plugin is failing to register or restarting for some reason.
Just a guess (logs/stack traces needed to confirm) but it could be this sequence:
|
Adding on profiling, you can use
node-driver-registrar/cmd/csi-node-driver-registrar/node_register.go Lines 116 to 124 in 6f7211c
|
I think the demo process(9684) was still running, these instances were the kubelet-registion-probe operations that was happening every 10 seconds, with command like: /csi-node-driver-registrar.exe --kubelet-registration-path=C:\var\lib\kubelet\plugins\disk.csi.azure.com\csi.sock --mode=kubelet-registration-probe |
If you're running a recent cluster version (1.25+) I'd suggest removing |
Let me asked team to see if they can move to 1.25. Thanks. I wonder if the demo and probe use the same code base, if we do, then we still need to figure out purpose of the Time Zone registry readings. Based on the call stack, it's called by csi-node-driver-registrar.exe directly by calling RegOpenKeyExW, but as you mentioned, I also can't find RegOpenKeyEx in this repo. I am guessing this may come from the Go Lang runtime. |
@Howard-Haiyang-Hao we don't need to wait for 1.25, we could change current azure disk daemonset config directly by removing |
@andyzhangx, let me work with you offline to see if the workaround solves the issue. Thanks! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/lifecycle frozen |
/remove-lifecycle stale |
de2fba88 Merge pull request kubernetes-csi#233 from andyzhangx/andyzhangx-patch-1 cee895e1 remove windows 20H2 build since it's EOL long time ago 670bb0ef Merge pull request kubernetes-csi#229 from marosset/fix-codespell-errors 35d5e783 Merge pull request kubernetes-csi#219 from yashsingh74/update-registry 63473cc9 Merge pull request kubernetes-csi#231 from coulof/bump-go-version-1.20.5 29a5c76c Merge pull request kubernetes-csi#228 from mowangdk/chore/adopt_kubernetes_recommand_labels 8dd28211 Update cloudbuild image with go 1.20.5 2b8b80ea fixing some codespell errors 72984ec0 chore: adopt kubernetes recommand label 901bcb5a Update registry k8s.gcr.io -> registry.k8s.io git-subtree-dir: release-tools git-subtree-split: de2fba88becec7dec6744355a8ddb0057c5fe2f9
I was trying to narrow down a perf issue and noticed that csi.node-driver-registrar.exe consumed 44% cpu time within 359ms. The majority of this time was spent on reading Container registry timezoon information.
Worked with @andyzhangx and he recommeneded tracking this issue here .
I would like to know:
Best regards,
Howard Hao
The text was updated successfully, but these errors were encountered: