Seeing disk failures

sdettmer · Nov 9, 2022

Hi,

for a test I used a known defect disk (a TOSHIBA MG04ACA400E) in ZFS RAID-Z2, made a scrub and wait for it to fail. I looked into Web GUI (v7.2-11).

In [Cluster log] at bottom I see no error.
In Datacenter | Summary status is green and all looks fine.
In Datacenter | Storare no error shows either.
In Datacenter | pve1 | Summary also I see no errors,
In Datacenter | pve1 | Disks I even see my disk "SMART: PASSED"
In Datacenter | pve1 | dpool usage stats appear without any hint to a failure
In Datacenter | pve1 | Disks | ZFS I see "Health: degraded"
Only when I double-click Datacenter | pve1 | Disks | ZFS | dpool I see my disk as "FAULTED".

Is this the expected behavior that it is so difficult to find such errors? Should I click through each node each pool to see failed disks or is there an easier way?
I think I had gotten an mail, I see an error in mail.log, my test system couldn't get the internet host to accept it (usually I mail via my local mail relay which uses SASL auth, maybe there is some anti-spam causing this).

I used no hardware RAID controller as suggested, so there is no beeping or no red LED flashing.

How to monitor disk status correctly in practice? I think I will add something to the regular maintenance schedule like "check zpool status" once a week or so. What would be better?

Any tips / best practices welcome!

Search

Search

Seeing disk failures

sdettmer

Active Member

We value your privacy