Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume without a target node after 1 node down in a 3 node cluster #1724

Open
veenadong opened this issue Aug 20, 2024 · 22 comments
Open

Volume without a target node after 1 node down in a 3 node cluster #1724

veenadong opened this issue Aug 20, 2024 · 22 comments
Assignees

Comments

@veenadong
Copy link

2.7.0, take 1 node down from a 3 node cluster:

core@glop-nm-126-mem2:~$ kubectl mayastor get volume cc36aef4-0ac9-459b-9e56-c571d5ba2c80 -o yaml
spec:
  num_replicas: 2
  size: 32212254720
  status: Created
  uuid: cc36aef4-0ac9-459b-9e56-c571d5ba2c80
  topology:
    node_topology: !labelled
      exclusion: {}
      inclusion: {}
      affinitykey: []
    pool_topology: !labelled
      exclusion: {}
      inclusion:
        openebs.io/created-by: operator-diskpool
      affinitykey: []
  policy:
    self_heal: true
  thin: true
  num_snapshots: 0
state:
  size: 32212254720
  status: Online
  uuid: cc36aef4-0ac9-459b-9e56-c571d5ba2c80
  replica_topology:
    c51a4d5d-c3b7-48cf-8674-07e7f51321fe:
      node: glop-nm-126-mem3.glcpdev.cloud.hpe.com
      pool: glop-nm-126-mem3.glcpdev.cloud.hpe.com-disk
      state: Online
      usage:
        capacity: 32212254720
        allocated: 926941184
        allocated_snapshots: 0
        allocated_all_snapshots: 0
    85e49541-ff6b-404c-b0aa-eb0e747b1a48:
      node: glop-nm-126-mem1.glcpdev.cloud.hpe.com
      pool: glop-nm-126-mem1.glcpdev.cloud.hpe.com-disk
      state: Unknown
  usage:
    capacity: 32212254720
    allocated: 926941184
    allocated_replica: 926941184
    allocated_snapshots: 0
    allocated_all_snapshots: 0
    total_allocated: 926941184
    total_allocated_replicas: 926941184
    total_allocated_snapshots: 0

Pods are not able to attach the volume:

  Warning  FailedMount  4m12s (x52 over 94m)  kubelet  MountVolume.MountDevice failed for volume "pvc-9a3c606b-9ca2-4438-a3ad-1a07138b6b95" : rpc error: code = Internal desc = Failed to stage volume 9a3c606b-9ca2-4438-a3ad-1a07138b6b95: attach failed: IO error: Input/output error (os error 5), args: hostnqn=nqn.2019-05.io.openebs:node-name:glop-nm-126-mem2.glcpdev.cloud.hpe.com,hostid=42164807-edfc-e94a-3af3-29184e3733b2,nqn=nqn.2019-05.io.openebs:9a3c606b-9ca2-4438-a3ad-1a07138b6b95,transport=tcp,traddr=10.245.244.129,trsvcid=8420,reconnect_delay=10,ctrl_loss_tmo=1980,nr_io_queues=2

Attached is the system dump (note: the logs collection failed using the plugin, so capture the logs using a different method).
mayastor.log.gz
mayastor-2024-08-20--18-05-02-UTC.tar.gz

@dcaputo-harmoni
Copy link

dcaputo-harmoni commented Sep 30, 2024

I am seeing this same error (attach failed: IO error: Input/output error (os error 5)) after taking one node down and bringing it back up. When running kubectl-mayastor get volumes it lists the volumes that had that target note as having a target node of <none> and accessibility of <none> as well, but a status of online.

Just to provide some further details here, it looks like for the volumes that went down as a result of the node going down, the frontend/host_acl node differs from the target node, whereas for the volumes that remained working, this was the same.

{
  "uuid": "8388d455-d250-4706-a16b-55dfa6ef8327",
  "size": 8589934592,
  "labels": null,
  "num_replicas": 2,
  "status": {
    "Created": "Online"
  },
  "policy": {
    "self_heal": true
  },
  "topology": {
    "node": {
      "Labelled": {
        "exclusion": {},
        "inclusion": {}
      }
    },
    "pool": {
      "Labelled": {
        "exclusion": {},
        "inclusion": {
          "openebs.io/created-by": "operator-diskpool"
        }
      }
    }
  },
  "last_nexus_id": null,
  "operation": null,
  "thin": true,
  "target": {
    "node": "aks-storage-93614762-vmss000002",
    "nexus": "9e0bbe18-9b2a-4aaf-ac71-10edf18a044d",
    "protocol": "nvmf",
    "active": true,
    "config": {
      "controllerIdRange": {
        "start": 5,
        "end": 6
      },
      "reservationKey": 12425731461037558000,
      "reservationType": "ExclusiveAccess",
      "preemptPolicy": "Holder"
    },
    "frontend": {
      "host_acl": [
        {
          "node_name": "aks-storage-93614762-vmss000001",
          "node_nqn": "nqn.2019-05.io.openebs:node-name:aks-storage-93614762-vmss000001"
        }
      ]
    }
  },
  "publish_context": {},
  "affinity_group": null
}

@tiagolobocastro
Copy link
Contributor

Seems we had missed the first one @veenadong, sorry about that.

@dcaputo-harmoni could you please share the volume attachments which reference this volume (if any) and also a support bundle?

@dcaputo-harmoni
Copy link

@tiagolobocastro Unfortunately I had to kill and restore the cluster right after this happened, and didn't get a chance to export the data you are looking for before I did. If it happens again I'll provide these details, thanks. I was running mayastor 2.7.0 and just upgraded to 2.7.1 when it rebuilt, and I know there are some stability improvements in there so am wondering if that might help.

@tiagolobocastro
Copy link
Contributor

No problem, I guess keep an eye on it and should it happen again please let us know.
Without logs hard to say if any stability fix would help here. Could be a data-plane bug or could be some simple miscommunication between the control-plane and csi, causing the volume to not be published.

@tiagolobocastro
Copy link
Contributor

tiagolobocastro commented Oct 3, 2024

I think this bug: #1747 (or a variation of this) explains what happens here. CSI and control-plane get out of sync, and the volume ends up not staying published and csi-node keeps trying to connect to a target which is not there.
A few things we can improve here:

  1. fix what caused the out of sync between csi and controller
  2. if csi-node can't connect to the volume target, it should check if the subsystem is listening and report this information. If it stays this way it surely means the volume target is not being created.

@Abhinandan-Purkait @dsharma-dc any other thoughts here?

@dsharma-dc
Copy link
Contributor

This looks like the sequence of events that happened.

old node = glop-nm-126-mem1.glcpdev.cloud.hpe.com
new node = glop-nm-126-mem3.glcpdev.cloud.hpe.com

17:04:58 - Unpublish of the volume triggered as a result of node shutdown. - failed. (503 Service Unavailable)
17:05:06 - Publish volume triggered as app moves to new node
17:05:08 - Unpublish of the volume attempted again - failed (503 Service Unavailable)
17:05:19 - Unpublish hasn't still succeeded as old node is down but spec is cleared of old target
17:05:21 - New target got created on new node as publish proceeded.

 [pod/mayastor-agent-core-88bc8d8b9-k7dcs/agent-core] 2024-08-20T17:05:21.330513Z INFO core::controller::resources::operations_helper: complete_create, val: Nexus { node: NodeId("glop-nm-126-mem3.glcpdev.cloud.hpe.com"), name: "cc36aef4-0ac9-459b-9e56-c571d5ba2c80", uuid: NexusId(c970c537-ec9f-46f7-bf6b-7997bad0e421, "c970c537-ec9f-46f7-bf6b-7997bad0e421"), size: 32212254720, status: Online, children: [Child { uri: ChildUri("bdev:///c51a4d5d-c3b7-48cf-8674-07e7f51321fe?uuid=c51a4d5d-c3b7-48cf-8674-07e7f51321fe"), state: Online, rebuild_progress: None, state_reason: Unknown, faulted_at: None, has_io_log: Some(false) }], device_uri: "", rebuilds: 0, share: None, allowed_hosts: [] }

[pod/mayastor-agent-core-88bc8d8b9-k7dcs/agent-core] at control-plane/agents/src/bin/core/controller/resources/operations_helper.rs:168
[pod/mayastor-agent-core-88bc8d8b9-k7dcs/agent-core] in core::volume::service::publish_volume with request: PublishVolume { uuid: VolumeId(cc36aef4-0ac9-459b-9e56-c571d5ba2c80, "cc36aef4-0ac9-459b-9e56-c571d5ba2c80"), target_node: Some(NodeId("glop-nm-126-mem3.glcpdev.cloud.hpe.com")), share: Some(Nvmf), publish_context: {"ioTimeout": "30"}, frontend_nodes: ["glop-nm-126-mem3.glcpdev.cloud.hpe.com"] }, volume.uuid: cc36aef4-0ac9-459b-9e56-c571d5ba2c80

17:05:22 - A retry of failing unpublish happened and deleted the new target as the spec referenced new one.
17:05:33 - nvme connect as part of volume staging fails since target is deleted.

@cwiggs
Copy link

cwiggs commented Dec 28, 2024

Seems we had missed the first one @veenadong, sorry about that.

@dcaputo-harmoni could you please share the volume attachments which reference this volume (if any) and also a support bundle?

I'm running into this issue and am generating the support bundle now. I'm not sure if it has sensitive info in the bundle so I'd prefer not to post it publicly, is there somewhere I can send it that is private?

@cwiggs
Copy link

cwiggs commented Dec 28, 2024

To expand on this I noticed that 1/3 of my k3s worker nodes was down so I restarted it. After that is when I started seeing the OpenEBS issue, although I can't say for sure it's related.

I can also upload the "volume attachments", but I'm not 100% sure what those are? The PV that is failing to attach was created dynamically via a PVC. The PVC is then mounted to a deployment, I can upload any of those manifests if that helps.

@cwiggs
Copy link

cwiggs commented Dec 28, 2024

Looks like I was able to work around this by scaling the replica down to 0 and then back to 1. Volume mounts successfully now.

I'd still like to send the support bundle, let me know where I can send it.

@tiagolobocastro
Copy link
Contributor

You can send it to [email protected]

@cwiggs
Copy link

cwiggs commented Dec 30, 2024

You can send it to [email protected]

I sent sent the tar file via an email from my cwiggs.com domain. Let me know if there is anything else that will help with the issue.

Thanks!

@mardep123
Copy link

Hi!
I'm also running into the same issue. I tried to make some changes to the cluster and therefor restarted one node at a time. I waited for the node to go from degraded to online. It worked on most of the nodes, but one of them did not go to online, but rather got target , accessibility . Not sure if it is related, next time it happens I can generate a support bundle.

@tiagolobocastro
Copy link
Contributor

You can send it to [email protected]

I sent sent the tar file via an email from my cwiggs.com domain. Let me know if there is anything else that will help with the issue.

Thanks!

We haven't received the email.

Not sure if it is related, next time it happens I can generate a support bundle.

That would help, thank you

@martinfjohansen
Copy link

Looks like I was able to work around this by scaling the replica down to 0 and then back to 1. Volume mounts successfully now.

@cwiggs How did you do this? What commands did you run?

@tiagolobocastro Is there any way to get the volumes back when/if this happens?

@cwiggs
Copy link

cwiggs commented Jan 9, 2025

We haven't received the email.

I keep getting a response from googlegroups.com that they weren't able to deliver the email since it has an attachment. I just sent it a 3rd time using Google Drive and so far it seems it went through.

@cwiggs How did you do this? What commands did you run?
The workload I have is a deployment, I use k9s to scale it to 0 using s and then back up to 1 with s. I believe there is also a kubectl scale command you can use.

@tiagolobocastro Is there any way to get the volumes back when/if this happens?
IME the volume isn't gone, it just isn't able to attach to the pod properly. It seems to me that something isn't properly detaching the volume from the previous pod when you restart the deployment, but if you scale to 0, and back up it unmount/mounts properly.

@martinfjohansen
Copy link

@cwiggs, so what you did was to scale the OpenEBS deployment itself down to 0 and then up to 3?

@cwiggs
Copy link

cwiggs commented Jan 9, 2025

@cwiggs, so what you did was to scale the OpenEBS deployment itself down to 0 and then up to 3?

No just the deployment that is using OpenEBS for the PV and throwing this error.

@tiagolobocastro
Copy link
Contributor

I've requested access to the google drive @cwiggs

@nneram
Copy link

nneram commented Jan 10, 2025

Hello,

We also encounter this issue after a node reboot in our HA cluster with 3 nodes. Specifically, the node that restarts experiences a MountVolume failure with the following error:

Name:             alertmanager-pgl-alertmanager-1
Namespace:        dome
Priority:         0
Service Account:  pgl-alertmanager
Node:             b02696-01srv/10.50.254.132
Start Time:       Thu, 09 Jan 2025 18:39:35 +0100
...
Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  3m6s (x512 over 17h)  kubelet  MountVolume.MountDevice failed for volume "pvc-3bfbec51-af9b-4ce2-90fa-06bec4b6075d" : rpc error: code = Internal desc = Failed to stage volume 3bfbec51-af9b-4ce2-90fa-06bec4b6075d: attach failed: IO error: Input/output error (os error 5), args: hostnqn=nqn.2019-05.io.openebs:node-name:b02696-01srv,hostid=33373150-3234-4e43-3630-333030463347,nqn=nqn.2019-05.io.openebs:3bfbec51-af9b-4ce2-90fa-06bec4b6075d,transport=tcp,traddr=10.50.254.132,trsvcid=8420,reconnect_delay=10,ctrl_loss_tmo=1980,nr_io_queues=2

The error logs from the csi-node show consistent failures:

csi-node   2025-01-10T09:49:28.468832Z ERROR csi_node::node: Failed to stage volume 3bfbec51-af9b-4ce2-90fa-06bec4b6075d: attach failed: IO error: Input/output error (os error 5), args: hostnqn=nqn.2019-05.io.openebs:node-name:b02696-01srv,hostid=33373150-3234-4e43-3630-333030463347,nqn=nqn.2019-05.io.openebs:3bfbec51-af9b-4ce2-90fa-06bec4b6075d,transport=tcp,traddr=10.50.254.132,trsvcid=8420,reconnect_delay=10,ctrl_loss_tmo=1980,nr_io_queues=2
csi-node     at control-plane/csi-driver/src/bin/node/node.rs:717
csi-node 
csi-node   2025-01-10T09:51:31.168755Z ERROR csi_node::node: Failed to stage volume 3bfbec51-af9b-4ce2-90fa-06bec4b6075d: attach failed: IO error: Input/output error (os error 5), args: hostnqn=nqn.2019-05.io.openebs:node-name:b02696-01srv,hostid=33373150-3234-4e43-3630-333030463347,nqn=nqn.2019-05.io.openebs:3bfbec51-af9b-4ce2-90fa-06bec4b6075d,transport=tcp,traddr=10.50.254.132,trsvcid=8420,reconnect_delay=10,ctrl_loss_tmo=1980,nr_io_queues=2
csi-node     at control-plane/csi-driver/src/bin/node/node.rs:717
...

This issue appears to affect stateful sets, particularly with the alertmanager-pgl-alertmanager in our setup. We've found that restarting the pod resolves the error, but we're looking for a more permanent solution.

I'm willing to provide a support bundle to, but there's an issue with log collection because I have an OAuth2 proxy in front of Loki, and I haven't found a way to pass a token to kubectl mayastor dump system. If anyone knows how to handle this authentication issue it would be great ! :) . In the meantime, I'll try to work around this and send the bundle to you.

@tiagolobocastro
Copy link
Contributor

Which version is this @nneram ? We've fixed a few issues on 2.7.2 that could be related to this. Though would require scaling application back down to 0 and then back up.
A support bundle would help.
Let's try without loki, the plugin now collects the k8s logs as well, might be enough.

@nneram
Copy link

nneram commented Jan 10, 2025

I'm using Mayastor version 2.7.1 with the openebs chart version 4.1.1. My kubectl-mayastor plugin is at revision 399c96472dc3 (v2.7.2+0). I'll also try to email the bundle since I'm not comfortable sharing it here.

@tiagolobocastro
Copy link
Contributor

Both are having the same/similar issue.
Volume seems fine, but nodeStage is failing.
Please also send across kernel logs to see if we can find some more info from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants