Seeing disk failures

Sep 13, 2022
69
9
8
Hi,

for a test I used a known defect disk (a TOSHIBA MG04ACA400E) in ZFS RAID-Z2, made a scrub and wait for it to fail. I looked into Web GUI (v7.2-11).

In [Cluster log] at bottom I see no error.
In Datacenter | Summary status is green and all looks fine.
In Datacenter | Storare no error shows either.
In Datacenter | pve1 | Summary also I see no errors,
In Datacenter | pve1 | Disks I even see my disk "SMART: PASSED"
In Datacenter | pve1 | dpool usage stats appear without any hint to a failure
In Datacenter | pve1 | Disks | ZFS I see "Health: degraded"
Only when I double-click Datacenter | pve1 | Disks | ZFS | dpool I see my disk as "FAULTED".

Is this the expected behavior that it is so difficult to find such errors? Should I click through each node each pool to see failed disks or is there an easier way?
I think I had gotten an mail, I see an error in mail.log, my test system couldn't get the internet host to accept it (usually I mail via my local mail relay which uses SASL auth, maybe there is some anti-spam causing this).

I used no hardware RAID controller as suggested, so there is no beeping or no red LED flashing.

How to monitor disk status correctly in practice? I think I will add something to the regular maintenance schedule like "check zpool status" once a week or so. What would be better?

Any tips / best practices welcome!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!