-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to disconnect from NVMeoF subsystem #2603
Comments
This messages says the nvme subsystem got informed from userspace to release all resources to the mentioned controller. I suspect that the transport driver is not performing the cleanup tasks. Is this with nvme-tcp? |
Yes, it is with nvme-tcp. |
Ah sorry, stupid me, you already mentioned that it it's nvme-tcp. Anyway, I've tried to replicated this with the current Linux head (6.13-rc1) and, # nvme connect -t tcp -a 192.168.154.145 -s 4420 -n nqn.io-1 --hostnqn nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af3
[block traffic between host and controller]
# nvme disconnect-all
# nvme list-subsys
[no output] and the kernel log doesn't show any thing suspecious.
Is the above command sequence what you are doing? If not, please provide those. Which kernel are you using? |
I'm fairly certain I've just seen this happen as well. Active nvmeof tcp connection disrupted by having the remote node disconnect, wait for the retries until controller removal, then be left with a subsystem entry in EDIT: Actually, scratch that connect comment. Even if you properly disconnect it after that manual connect, from that point onward, it does not seem to properly clear out at all anymore? |
Okay, I haven't tested the path, when the retry counter hits the limit and it gets auto removed. Let's see... |
I've tried to reproduce this with current HEAD and also with 6.12, but no luck. Also I can't see how the remove ctrl path could leak the subsystem reference (which seems to be the problem here). Anyway, it's a kernel issue and not really a nvme-cli bug. I suggest you report this to the nvme mailing list. I could also post the question on the mailing list but since I can't reproduce it, it's likely going no where if I do so. Sorry. |
@igaw
|
Is there a way to reproduce it? Hmm, so the connection is in deleting (no IO) state. That might help to identify the problem. Maybe we are waiting on an request to complete but we never end it... |
I wish I could provide more info but the 'have the remote become unavailable and wait past its retries' seems to be all it takes. Not aware of anything special at the moment. Currently on kernel 6.12.10 and it had happened again. Looking at the Could the |
If using an NVMe initiator to connect to an NVMe target via NVMe-TCP, and the NVMe target node goes offline, resulting in the Paths in the subsystem displayed by
nvme list-subsys
being empty, how can I delete this subsystem? Or, what parameters should be provided during the connection to handle this situation?I've tried the
nvme disconnect
function, but it doesn't work.dmesg
The text was updated successfully, but these errors were encountered: