Passed through raid card to Windows VM - Get dozens of emails about smart data not available from host?

gleep52

New Member
Nov 4, 2024
3
0
1
I have a raid controller card that I've used hardware passthrough to a windows VM. My proxmox host continues to email me about each drive I have attached that it cannot retrieve the SMART info for the drives. How do I stop these emails, but not other warnings that I will actually want. I have a lot of other hard drives on the host and DO want them to continue smart monitoring - and I am already monitoring the raid controller on my windows host.

I've added the line
/dev/bus/0 -d ignore

in my /etc/smartd.conf file - but I still get the email alerts twice a day for each drive.

Email shows this:

Code:
This message was generated by the smartd daemon running on:

   host name:  <server>
   DNS domain: domain

The following warning/error was logged by the smartd daemon:

Device: /dev/bus/0 [megaraid_disk_70] [SAT], Read SMART Self-Test Log Failed

Device info:
WDC WD80EZZX-11CSGA0, S/N:ABCABCFF, WWN:digits, FW:83.H0A03, 8.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

Any other ideas what I can do or try in order to stop these emails from sending, but still allow other smart info emails and normal system monitoring to continue?
 
I don't know how exactly but it can probably be changed in the Linux smartctl daemon configuration. You could also early bind the raid controller to vfio-pci and prevent the Proxmox/Linux kernel and smartctl from discovering (and trying to monitor) the drives.

EDIT: Here is an example of early binding to vfio-pci in the Proxmox manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_host_configuration
 
Last edited:
So what's going on is that smartd starts and monitors ALL my drives, THEN the VM starts and my HBA is passed through. smartd doesn't know the difference between this and the drives going bad, so it starts sending emails. I can either restart the service or send a HUP signal to it to fix it. I think the simplest way to handle this is to just restart the service whenever I restart my server... it's so infrequent. Maybe sometime I'll try to ignore the whole controller in smartd, but I'd need my truenas instance down for that.
 
I could, but sometimes it's useful to be able to access the HBA from the host when VM is shut off. It's been useful for diagnosing things. Also an option though, thank you!
 
You can always unbind vfio-pci and bind the actual driver manually (I recommend a small script as it's several steps), even when using early binding to vfio-pci. Or blacklist the driver if you have no other devices that use the same driver. Then you can still load the driver manually (as blacklisting only prevent the driver from loading automatically).
 
You can always unbind vfio-pci and bind the actual driver manually (I recommend a small script as it's several steps), even when using early binding to vfio-pci. Or blacklist the driver if you have no other devices that use the same driver. Then you can still load the driver manually (as blacklisting only prevent the driver from loading automatically).
Thanks, good info! I'll probably do that then.