[SOLVED] Hardware RAID Notification/Status

bigun89

Member
Jan 28, 2022
36
1
13
44
We are a mid-sized company and we've been dipping our toes into Proxmox, and we are trying to vet out any potential issues before we swap in the future.

One of the things that has crossed my mind is that all of our hosts run hardware RAID (PERC controllers of some flavor depending on the model of the server).

I have the commandline tools smartctl and perccli64 that can report health if ran manually, but I'm not sure how I can get these alerts on the proxmox dashboard and to alert us via e-mail of any issues (degraded array, etc).

Suggestions?
 
Last edited:
With PVE you are limited to that was the PVE dashboard is already showing. I would setup a monitoring tool like for example zabbix if you need more than that.
 
Hi,

I have the commandline tools smartctl and perccli64 that can report health if ran manually, but I'm not sure how I can get these alerts on the proxmox dashboard and to alert us via e-mail of any issues (degraded array, etc).

At least smartd sends out mails on errors to the local root account, and Proxmox VE has a forward setup to forward those to the mail address configured for the root@pam user in the PVE user management, so that part should be already covered, and I can confirm that it works as I purposely left a fault ancient HDD in a test system of mine and get a mail from it a day like clockwork.
 
Last edited:
Hi,



At least smartd sends out mails on errors to the local root account, and Proxmox VE has a forward setup to forward those to the mail address configured for the root@pam user in the PVE user management, so that part should be already covered, and I can confirm that it works as I purposely left a fault ancient HDD in a test system of mine and get a mail from it a day like clockwork.
Then there's a problem. I have an e-mail address configured for the root@pam user, and that address has not gotten any notification in regards to this degraded array.

This server isn't in production, so this is fine for a test - but I yanked one of the drives out of the RAID-1 OS array, I checked /var/log/syslog and the kernel does show a degraded array through the megaraaid_sas driver:

Code:
Feb  9 09:13:36 proxmox1 kernel: [413153.798151] megaraid_sas 0000:03:00.0: scanning for scsi0...
Feb  9 09:13:36 proxmox1 kernel: [413153.798632] megaraid_sas 0000:03:00.0: 3053 (697731215s/0x0001/CRIT) - VD 00/0 is now DEGRADED

But no e-mail. I get notifications for finished backup jobs, but not this.
 
There is so much HW controllers so PVE doesn't support this, you need your own monitoring. SMART is for checking state of disk, but it's not same as state of disk in array.
Result - use your own monitoring.
 
Hmm, FWIW, I have the following config line in /etc/smartd.conf:

DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner

The -m root part is causing the mails to root that then get forwarded, and IIRC I did not edit that, but that box is a bit older, so I may just have forgotten that I indeed did.
 
So, one of my co-workers came up with a brilliant solution - use iDRAC.

After setting up e-mail alerts and attaching it to the network, it should alert us with any and all hardware issues - not just hard disk issues.

This is one area where Proxmox is missing the mark. Either they need hardware integrated solutions or make it explicitly clear to use a out of bounds management system (like Dell's iDRAC) to track hardware issues.

I think using an OOB to tackle this issue may be the low-cost, low-overhead solution to this. Paying the extra cash for an extended iDRAC capable system is a lot cheaper than paying for the constant licensing required by other hypervisor solutions out there.
 
This is one area where Proxmox is missing the mark. Either they need hardware integrated solutions or make it explicitly clear to use a out of bounds management system (like Dell's iDRAC) to track hardware issues.
Not sure how one can put the blame onto Proxmox VE for this, we neither sell hardware nor can PVE make assumptions about if iDRAC or some other stuff is available, that's in the hands of the actual admins.

IMO, avoiding the use of HW RAID controller is even cheaper and avoids a lot of headache in general anyway ;-)
Using ZFS/BTRFS for (mostly) local storage or Ceph for clustered/shared storage is much more flexible and powerful, has native Proxmox VE integration and does not rely on trusting some proprietary FW/HW black box.
 
  • Like
Reactions: RokaKen
Not sure how one can put the blame onto Proxmox VE for this, we neither sell hardware nor can PVE make assumptions about if iDRAC or some other stuff is available, that's in the hands of the actual admins.

IMO, avoiding the use of HW RAID controller is even cheaper and avoids a lot of headache in general anyway ;-)
Using ZFS/BTRFS for (mostly) local storage or Ceph for clustered/shared storage is much more flexible and powerful, has native Proxmox VE integration and does not rely on trusting some proprietary FW/HW black box.
Please don't misunderstand - I'm not placing "blame" per se (not intentionally anyway).

This environment I'm running is purely testing at this point to see if we can replace our current hypervisor with Proxmox. I think it could, but this is a pretty big missed feature.

However, I get it. Tons and tons of RAID controllers out there. All I'm saying is that a lot of shops run hardware RAID, the lack of this feature may turn them off. However, a decent out of band management could easily counter the concern, a mention of this somewhere may head those concerns off preemptively.

On the topic of software RAID, isn't the software overhead going to tax performance? Then again, that may be a convo for a different thread.
 
Last edited:
Please don't misunderstand - I'm not placing "blame" per se (not intentionally anyway).
No problem, I may have sounded "offended" by mistake.

This environment I'm running is purely testing at this point to see if we can replace our current hypervisor with Proxmox. I think it could, but this is a pretty big missed feature.

However, I get it. Tons and tons of RAID controllers out there. All I'm saying is that a lot of shops run hardware RAID, the lack of this feature may turn them off. However, a decent out of band management could easily counter the concern, a mention of this somewhere may head those concerns off preemptively.

No it's a good point actually, I'd think that the historic setting here is that especially in the beginning of Proxmox VE most user flocked in from Debian or other Linux shops that already knew what/how this such that it fits their requirements, for user switching from more proprietary solutions this certainly isn't clear, so expanding the docs in that regard would be good to do.

FYI, there are some tools available through Debian that possibly help like perccli64 that you found, albeit PERC are sometimes (often?) just LSI controllers so the storcli (formerly megacli) tool could work too, scripting them is always a possibility that while needing a bit hands-on is the most flexible option.

FWIW, for nagios based monitoring solutions there's the monitoring-plugins-contrib package which provides support for quite some controllers, so as a user already mentioned, setting monitoring up in general can also provide this centrally.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!