ISCSI Session Healing can make bad situations worse #961

speedyguy17 · 2025-01-10T18:43:18Z

ISCSI Session healing will take the following:

detect all ISCSI sessions that are not in "logged in" state
wait for a timeout
log them out and back in

This has the impact of causing any ext4 filesystems mounted on top of devices owned by that session go read-only, leading to any pods consuming those PVs to become irrecoverable.

Consider an (unfortunately) extended network outage:

all iscsi sessions states becomes "FREE"
Trident will detect this sessions as stale (not LOGGED IN)
after the session recovery timeout, trident will set the action for the sessions to LogoutLoginRescan
Trident issues iscsiadm -m ..... -u on the sessions
Upon logout of the sessions, Linux tears down each of the /dev/sdXX block devices
Upon teardown of the last sdXX backing a given volume, multipath returns EIO to any outstanding IO on the /dev/dm-XX device
When ext4 receives EIO for a jbd2 IO, it intentionally and irrecoverably marks the filesystem as read only

At this point, the Pod sees an RO PV that cannot be recovered without a remount of the file system as RW, and a restart of the pod. The session healing has turned a recoverable network outage into an irrecoverable degradation of the file system.

speedyguy17 added the bug label Jan 10, 2025

sjpeeris added the tracked label Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISCSI Session Healing can make bad situations worse #961

ISCSI Session Healing can make bad situations worse #961

speedyguy17 commented Jan 10, 2025

ISCSI Session Healing can make bad situations worse #961

ISCSI Session Healing can make bad situations worse #961

Comments

speedyguy17 commented Jan 10, 2025