Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed hot spare drive is not detected / HPE ssacli #229

Open
crocodileneptune opened this issue Nov 10, 2024 · 0 comments
Open

Failed hot spare drive is not detected / HPE ssacli #229

crocodileneptune opened this issue Nov 10, 2024 · 0 comments

Comments

@crocodileneptune
Copy link

Hello Glen,

first of all thanks so much for your work!

I noticed that your check_raid.pl plugin doesn't seem to trigger the warning or critical state in the case of a failed spare drive. In my case, the server used to run on two harddisks in a RAID1 configuration, with another harddisk configured as a hot spare device. When I looked at ILO logs last night, I saw that the harddisk in bay 1 failed and the hot spare in bay 3 was activated some time ago.

I would have expected that the check_raid.pl plugin would trigger some sort of warning if any harddisk fails which is why I now created this bug report. I don't mind the exact state (warning or critical), but a failed device needs to trigger an action which is why I am using the plugin. I read CONTRIBUTING.md and I hope that all relevant details are included in this bug report.

Output of check_raid -d:

# /usr/lib/nagios/plugins/check_raid.pl -d
check_raid 4.0.10
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport

DEBUG EXEC: /sbin/dmsetup status --noflush at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /proc/mdstat at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller all show status at /usr/lib/nagios/plugins/check_raid.pl line 503.
DEBUG EXEC: /sbin/ssacli controller slot=0 logicaldrive all show at /usr/lib/nagios/plugins/check_raid.pl line 503.
OK: ssacli:[Smart Array P440ar[OK]: Array A(OK)[LUN1:OK]]

Output of each command from check_raid -d

/sbin/ssacli controller all show status

Smart Array P440ar in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK

/sbin/ssacli controller slot=0 logicaldrive all show

Smart Array P440ar in Slot 0 (Embedded)

   Array A

      logicaldrive 1 (558.88 GB, RAID 1, OK)

However, the failed hot spare drive is not detected, even though ssacli notices it:

/sbin/ssacli ctrl slot=0 pd all show status

   physicaldrive 1I:3:2 (port 1I:box 3:bay 2, 600 GB): OK
   physicaldrive 1I:3:3 (port 1I:box 3:bay 3, 600 GB): OK
   physicaldrive 1I:3:1 (port 1I:box 3:bay 1, 0 GB, spare): Failed

Additional environment details:

  • Debian 12 Bookworm
  • HPE DL360 Gen9 with a P440ar raid controller + BBU

Thanks and best wishes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant