[SOLVED] ZFS pool Degraded

modem7 · Jul 11, 2023

Hey guys,

I've got a warning for degraded ZFS array, and I'm just curious as to what's causing it.

Code:

zpool status -v proxmox
  pool: proxmox
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:08:30 with 0 errors on Tue Jul 11 11:26:25 2023
config:

        NAME                                  STATE     READ WRITE CKSUM
        proxmox                               DEGRADED     0     0     0
          mirror-0                            DEGRADED     0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx1  ONLINE       0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx2  DEGRADED    12     2    49  too many errors

errors: No known data errors

I've changed the data cables (not tried changing power cables), changed the ports to a different controller on the motherboard, and if I do a zpool clear the warning goes away for a couple of weeks until it comes back.

SMART test passes, and the SSDs are relatively new (yes, they're consumer, so I know I'll need to replace them sooner than if I used an enterprise SSD).

The drive that's degraded is always the same one that comes up as degraded, is this a faulty drive? Is there a way I can prove it's faulty to the manufacturer if so?

ZFS unfortunately isn't my forte, so any help would be great. Cheers in advance!

Philipp Hufnagl · Jul 11, 2023

I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?

modem7 · Jul 11, 2023

Philipp Hufnagl said:
I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?

I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!

Philipp Hufnagl · Jul 11, 2023

I hope you can get a refund on that ssd

modem7 · Jul 11, 2023

Philipp Hufnagl said:
I hope you can get a refund on that ssd

Aye, shouldn't be too much of a problem.

With the SMART result, that's proof enough for a replacement at least!

leesteken · Jul 11, 2023

modem7 said:
I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!

Run smartctl -t long on that drive and wait until it is finished (it will return immediately so you'll have to check once in a while) , if you want the drive to do a long self-test which might provide a concrete error by the drive itself.

modem7 · Jul 11, 2023

Woop, Amazon are replacing the drive. Should be here tomorrow, and I'll have to learn how to replace a ZFS drive in place

Hopefully https://blog.dalydays.com/post/2021-10-13-how-to-hot-swap-zfs-disk-in-proxmox/ still applies

Philipp Hufnagl · Jul 11, 2023

modem7 said:
Woop, Amazon are replacing the drive. Should be here tomorrow, and I'll have to learn how to replace a ZFS drive in place

Hopefully https://blog.dalydays.com/post/2021-10-13-how-to-hot-swap-zfs-disk-in-proxmox/ still applies

Its not hard. You can find the documentation how to do it here

[SOLVED] ZFS pool Degraded

modem7

Member

Philipp Hufnagl

Active Member

modem7

Member

Philipp Hufnagl

Active Member

modem7

Member

leesteken

Distinguished Member

modem7

Member

Philipp Hufnagl

Active Member

We value your privacy