-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVC Creation stuck with v3.13.0 #5073
Comments
@appcoders did you try to create an rbd image from the rbd provisioned pod using the same ceph user specified in the secret? if that works please share the output and also the csi-rbdplugin container logs from the provisioner pod. |
Hi @appcoders, I haven't had time to take look to all the manifests, however, I do see something potentially strange in the logs. You are using a .1 address as a monitor. Do you have a ceph public network larger than /24 or do you have the gateway at a non standard address? If not and your network is 192.168.72.0/24 with gw 192.168.72.1, you might need to review your ip assignment. But again, this is just a very fast look at this issue. Hope this helps. |
Hi @alex-ioma, |
Gotcha. In this case this is not an issue. And 10GB is a must for this kind of setup - very similar to what I have. |
Hi @Madhu-1 Good point. I created an image directly on the ceph host with rbd and the user from the secret:
command exits and image is created.
which lists the image fine, regardless which port/protocol is used. Now trying to create a image with rbd from the pod it stalls forever
I captured network traffic on host 192.168.72.2 with first from pod, and second part from other ceph host: So a network/firewall issue can be ruled out? I don't get it what the problem is at all. How can I debug this further? |
@Madhu-1 So I logged debug_mon with 10/10 and got this output for the access from the debiantest VM with rbd create:
|
@Madhu-1
also stalls. Log from debiantest node:
Same command log from a Proxmox Host that has no ceph osds/mgr/mds and works fine.
|
@appcoders can you run the rbd create with --debug-rbd flag
|
The problem is likely with connecting to one (or more) of the OSDs, not with the monitor. It's definitely too early to rule out a network/firewall issue. In addition to |
thanks for your kind support. I am very grateful, especially for the last tip. It is a network problem after all. We are in clarification with the hoster, there is probably a problem with the cabling on site or configuration of switches. Logging was the crucial piece of the puzzle for me: 192.168.72.54:0/2070396725 >> [v2:192.168.72.8:6808/2877638,v1:192.168.72.8:6809/2877638] conn(0x55bc92d80430 msgr2=0x55bc92d828c0 unknown :-1 s=STATE_CONNECTING_RE l=1).tick see no progress in more than 10000000 us during connecting to v2:192.168.72.8:6808/2877638, fault. Its one host that is not reachable from the vm. With this knowledge of how to get detailed logging, I hope that others will also be able to track down such errors more quickly. So I will close this issue as I think the problem will vanish soon :-) |
So now everything works fine after network configuration has been fixed. Thanks again. |
I am trying to get ceph-csi running for 3 days now. I am using k3s v1.31.4+k3s1 and ceph v3.13.0.
I'm following https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/
I replaced canary with v3.13.0
All pods come up, I can reach the cluster without issues from the pod.
There are no stale rbd commands on the nodes/pods. No special output or errors on dmesg on the nodes.
It looks that simply nothing happens after " setting image options ":
Install is done this way in namespace ceph-csi-rbd:
And after everything is up:
All yamls and logfiles attached.
log.txt
csidriver.yaml.txt
ceph-config-map.yaml.txt
csi-config-map.yaml.txt
csi-kms-config-map.yaml.txt
csi-rbd-secret.yaml.txt
csi-nodeplugin-rbac.yaml.txt
csi-provisioner-rbac.yaml.txt
csi-rbd-sc.yaml.txt
csi-rbdplugin-provisioner.yaml.txt
csi-rbdplugin.yaml.txt
raw-block-pvc.yaml.txt
The text was updated successfully, but these errors were encountered: