Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The nvme disk mount failure with azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0) #2777

Open
duanwei33 opened this issue Dec 26, 2024 · 12 comments

Comments

@duanwei33
Copy link

What happened:
Azure disk mount failed on some instance types with error: azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0) in Openshift, currently found the following:
Standard_M16bds_v3
Standard_L8s_v4
Standard_M16bs_v3

$ oc describe pod mypod-test-1
  Warning  FailedMount             <invalid> (x3 over <invalid>)  kubelet                  MountVolume.MountDevice failed for volume "pvc-2bad271a-abce-4ab1-975b-385bd08e4261" : rpc error: code = Internal desc = failed to find disk on lun 0. azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0)

When checking the node, seems there is no lun info (not sure if it is the reason):

 lsblk -o NAME,KNAME,MAJ:MIN,FSTYPE,SIZE,TYPE,MOUNTPOINT,HCTL
NAME        KNAME     MAJ:MIN FSTYPE   SIZE TYPE MOUNTPOINT HCTL
nvme0n1     nvme0n1   259:0            128G disk
|-nvme0n1p1 nvme0n1p1 259:1              1M part
|-nvme0n1p2 nvme0n1p2 259:2   vfat     127M part
|-nvme0n1p3 nvme0n1p3 259:3   ext4     384M part /boot
`-nvme0n1p4 nvme0n1p4 259:4   xfs    127.5G part /sysroot
nvme0n2     nvme0n2   259:5             10G disk.                         <------ this is the attached volume.

What you expected to happen:
The disk should be mounted successfully

How to reproduce it:
Always

Anything else we need to know?:

Environment:

  • CSI Driver version: 1.31.1
  • Kubernetes version (use kubectl version): v1.31.3
  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS release 4.18
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@andyzhangx
Copy link
Member

it's missing udev rules for nvme controller disks on the node, could you run following command, and then try again?

kubectl apply -f https://raw.githubusercontent.com/andyzhangx/demo/refs/heads/master/aks/download-v6-disk-rules.yaml

@duanwei33
Copy link
Author

duanwei33 commented Dec 26, 2024

it's missing udev rules for nvme controller disks on the node, could you run following command, and then try again?

kubectl apply -f https://raw.githubusercontent.com/andyzhangx/demo/refs/heads/master/aks/download-v6-disk-rules.yaml

Yes, it works when I added the 80-azure-disk.rules!
Thank you so much!

@duanwei33 duanwei33 changed the title Disk mount failure in Standard_M16bs_v3 with azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0) The nvme disk mount failure with azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0) Dec 26, 2024
@duanwei33
Copy link
Author

One more question, should CSI Driver cover this or does it need to be handled by node os part?

@andyzhangx
Copy link
Member

we have set up this new udev rules in the new aks vhh image, it would rollout in aks rp 2025.2 release.

@andyzhangx
Copy link
Member

o, you are using OpenShift, so I think you should include it in your image by yourself. here is an example PR: Azure/AgentBaker#5444

@duanwei33
Copy link
Author

Yeah acked, thanks for the confirmation.
And Happy New Year!

@phil-fileread
Copy link

Hi @andyzhangx , we are seeing this issue on Standard_d4ads_v6 VMs in AKS (EAST US). Do you have an idea when this will be addressed?

@andyzhangx
Copy link
Member

andyzhangx commented Jan 1, 2025

Hi @andyzhangx , we are seeing this issue on Standard_d4ads_v6 VMs in AKS (EAST US). Do you have an idea when this will be addressed?

@phil-fileread

it's missing udev rules for nvme controller disks on the node, could you run following command, and then try again?

kubectl apply -f https://raw.githubusercontent.com/andyzhangx/demo/refs/heads/master/aks/download-v6-disk-rules.yaml

we have set up this new udev rules in the new aks vhd image, it would rollout in aks rp 2025.2 release.

@phil-fileread
Copy link

Thanks for the assistance @andyzhangx.

Apologies for my ignorance, but what is an Azure vhh image (I'm otherwise familiar with AKS and the azure ecosystem), and where are the aks rp releases documented and announced?

Thanks and happy new year.

@andyzhangx
Copy link
Member

@phil-fileread aks releases are announced here: https://github.com/Azure/AKS/releases, this issue would be fixed on 2025.2 release and when your region is rolled out with the new release, you could upgrade node image to fix the issue.

@phil-fileread
Copy link

@andyzhangx, how do the releases shown on https://github.com/Azure/AKS/releases map to the notation of 2025.2. Does this just mean the release that takes place in February 2025?

Thanks!

@andyzhangx
Copy link
Member

the AKS vhd image fix would be in aks rp 202502 release, which would rollout in next month

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants