I have "mixed results" ... some ZFS notifications work, some not. Here's what I tested:
Preparation
- Using Proxmox VE 7.3-3
- Ran "apt install mailutils" (as per the above suggestion)
- Created a ZFS pool "local-zfs" with 3 disks using the PVE GUI
- Migrated a VM disk to the pool (just to have some data there)
- Tested the below 3 scenarios, all of which end in a degraded pool
Scenario 1 (
working)
- Command "zpool offline -f local-zfs ata-QEMU_HARDDISK_QM00005"
--> Email with subject "ZFS device fault for pool 0xBDE81065A6D18BCB on pve-vm1"
- Command "zpool clear local-zfs"
--> Email with subject "ZFS resilver_finish event for local-zfs on pve-vm1"
Scenario 2 (
not working)
- - Command "zpool offline local-zfs ata-QEMU_HARDDISK_QM00005"
--> No email (even though pool shows as "degraded")
- Command "zpool online local-zfs ata-QEMU_HARDDISK_QM00005"
--> Email with subject "ZFS resilver_finish event for local-zfs on pve-vm1"
Scenario 3 (
not working)
- Shut down PVE node
- Unplug one of 3 hard disks of the pool
- Start the PVE node and modified some data on degraded pool (to force resilvering)
--> No email (even though pool shows degraded)
- Shut down PVE node
- Replug the unplugged hard disk
- Start the PVE node
- Command "zpool status" shows "scan: resilvered 464K in 00:00:00 with 0 errors on <timestamp of just now>"
--> No email (even though resilvering finished for the pool as in scenario 1 and 2)
Conclusions
- Device failures (done with zpool offline -f) do trigger a mail alert --> Good!
- The fact that a pool is degraded does not trigger an alert
- Missing member disks of a pool (e.g. after a reboot) do not trigger an alert
- Resilvering completed in very short time right after a reboot does not trigger an alert
Question: Any ideas on how to solve that inconsistent behaviour?