-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd remap on network failure #4712
Comments
@clwluvw This is not possible with cephcsi as cephcsi is not kubernetes specific (even though we have some kubernetes log (which we are really planning to get rid of soon), This need to be done by some external operator |
In order to re-map an RBD image, all users (applications) of the filesystem need to be restarted as well. It is cleaner for an application to report a failure and cause a restart of the container once such a problem happens. Ceph-CSI has a health-checker that is currently only used with CephFS (see #4200). We had plans to extend that to RBD as well, but this has not been done. It would be possible to report that the volume is unhealthy to kubelet (in the |
I guess this would be the best option but it needs to be extended on k8s to restart the pod in case of volume failure. |
@clwluvw Yes that is correct but For now, i think you can add a check in the hook to ensure that pvc is writable if not restart the pod. this might work for RWO volumes not for RWX as all the pods need to be scaled down to 0 and scaled back |
@Madhu-1 What do you mean by the hook? Do you mean the readinessProbe? |
i mean to use |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Describe the feature you'd like to have
After a node experiences a network outage all rbd images mounted on the node will lose the watcher and be remounted as read-only.
Perhaps cephcsi can watch PVC and make sure they are remapped if the watcher is none or the file system is mounted as read-only while it shouldn't.
What is the value to the end user? (why is it a priority?)
Recover volumes after a network failure which doesn't affect the pods or anything else.
How will we know we have a good solution? (acceptance criteria)
After a failover from a network failure, mounted RBD images should be remapped and not be read-only if they shouldn't be.
The text was updated successfully, but these errors were encountered: