Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with rotating HDD and media life left #150

Closed
ondrejsuk1 opened this issue Jan 16, 2025 · 7 comments
Closed

Problem with rotating HDD and media life left #150

ondrejsuk1 opened this issue Jan 16, 2025 · 7 comments
Milestone

Comments

@ondrejsuk1
Copy link

ondrejsuk1 commented Jan 16, 2025

Hi,

for first, I'd like to thank you for your project - nice work ! :)

I created this issue, because after we upgraded this check to version 1.9.0, we experience issue on some servers.

Problem is, that server with physical HDD reports 0% of Media life left:

[WARNING]: Physical Drive Physical Disk 0:1:0 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:1 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:2 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:3 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:4 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:5 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:6 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: Physical Drive Physical Disk 0:1:7 (<REDACTED>, HDD, SAS, Media life left: 0%, Status: Enabled) 8001.56GiB status: WARNING
[WARNING]: One or more storage components report an issue|'media_life_left_drive_Physical_Disk_0:1:0'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:1'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:2'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:3'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:4'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:5'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:6'=0%;10;-1 'media_life_left_drive_Physical_Disk_0:1:7'=0%;10;-1 'media_life_left_drive_SSD_0'=89%;10;-1 'media_life_left_drive_SSD_1'=89%;10;-1

I look into closed issues and find #143 . It looks like that code doesn't reflect if disk is SSD or HDD and Redfish returns 0% for HDD (according do returned data):

{
  "@odata.context": "/redfish/v1/$metadata#Drive.Drive",
  "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/NonRAID.Integrated.1-1/Drives/Disk.Bay.0:Enclosure.Internal.0-1:NonRAID.Integrated.1-1",
  ...
  "MediaType": "HDD",
  "PredictedMediaLifeLeftPercent": 0,
  "Protocol": "SAS",
  ...
}

I use workaround as setting "-1" as critical and warning for storage section, but whole section is now non-monitored.

My Python experience is not that good as you - I can't find my way around your code structure. Otherwise I would have created a pull request.

Thank you :)

@bb-Ricardo
Copy link
Owner

Hi @ondrejsuk1,

thank you for your kind words. Makes me happy that this project helps you.

I had a look and added a change which should fix your issue. Can you please check out the next-release brach and check if it works now?

Thank you

@ondrejsuk1
Copy link
Author

Hi,

thank you for quick reply :) I tested branch next-release and it not works, but error message is different:

[CRITICAL]: Physical Drive Physical Disk 0:1:0 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:1 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:2 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:3 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:4 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:5 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:6 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: Physical Drive Physical Disk 0:1:7 (MG06SCA800EY, HDD, SAS, Status: Enabled) 8001.56GiB status: CRITICAL
[CRITICAL]: One or more storage components report an issue|'media_life_left_drive_SSD_0'=89%;10;5 'media_life_left_drive_SSD_1'=89%;10;5

So I went into storage.py and look for parts, where media_life_left_* was used and found one condition, where SSD is not mentioned. On line 752 so when I change this line from:

if pd_inventory.predicted_media_life_left_percent is not None:

to:

if pd_inventory.type == "SSD" and pd_inventory.predicted_media_life_left_percent is not None:

It works without some errors :)

[OK]: All storage controllers (4), logical drives (9), physical drives (10) and enclosures (1) are in good condition.|'media_life_left_drive_SSD_0'=89%;10;5 'media_life_left_drive_SSD_1'=89%;10;5

bb-Ricardo added a commit that referenced this issue Jan 17, 2025
@bb-Ricardo
Copy link
Owner

Hi @ondrejsuk1,

sorry for the inconvenience. It was late 😅.

I had it fixed for HPE specific drives and not all the others. I just pushed another commit. can you please check it out and see if this solves the issue now?

Thank you

@ondrejsuk1
Copy link
Author

Hi,

nope:

Traceback (most recent call last):
  File "/home/sukondrej/Stažené/check_redfish-next-release/./check_redfish.py", line 172, in <module>
    if any(x in args.requested_query for x in ['storage', 'all']):  get_storage()
                                                                    ^^^^^^^^^^^^^
  File "/home/sukondrej/Stažené/check_redfish-next-release/cr_module/storage.py", line 117, in get_storage
    get_storage_generic(system)
  File "/home/sukondrej/Stažené/check_redfish-next-release/cr_module/storage.py", line 1208, in get_storage_generic
    get_drive(controller_drive.get("@odata.id"))
  File "/home/sukondrej/Stažené/check_redfish-next-release/cr_module/storage.py", line 752, in get_drive
    if drive_data.type == "SSD" and pd_inventory.predicted_media_life_left_percent is not None:
       ^^^^^^^^^^
NameError: name 'drive_data' is not defined. Did you mean: 'drive_oem_data'?

I think that var with drive data is named pd_inventory as stated at line 694. So maybe pd_inventory.type ?

if pd_inventory.type == "SSD"

@bb-Ricardo
Copy link
Owner

holy crap, this is embarrassing. See kids, this it what happens if you don't run your tests 🤦

Pushed yet another commit.

Really sorry for that.

@ondrejsuk1
Copy link
Author

It's OK :D This just happens - 1/3 of my commits are fixing typos :)

Now it works fine -many thanks ;)

[OK]: All storage controllers (4), logical drives (9), physical drives (10) and enclosures (1) are in good condition.|'media_life_left_drive_SSD_0'=89%;10;5 'media_life_left_drive_SSD_1'=89%;10;5

We can close this issue now :)

@bb-Ricardo
Copy link
Owner

thank you for testing and patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants