[SOLVED] ZFS pool Degraded

modem7

Member
Nov 2, 2021
35
3
13
39
Hey guys,

I've got a warning for degraded ZFS array, and I'm just curious as to what's causing it.

1689070945533.png

1689070955151.png

1689070963326.png

Code:
zpool status -v proxmox
  pool: proxmox
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:08:30 with 0 errors on Tue Jul 11 11:26:25 2023
config:

        NAME                                  STATE     READ WRITE CKSUM
        proxmox                               DEGRADED     0     0     0
          mirror-0                            DEGRADED     0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx1  ONLINE       0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx2  DEGRADED    12     2    49  too many errors

errors: No known data errors

I've changed the data cables (not tried changing power cables), changed the ports to a different controller on the motherboard, and if I do a zpool clear the warning goes away for a couple of weeks until it comes back.

SMART test passes, and the SSDs are relatively new (yes, they're consumer, so I know I'll need to replace them sooner than if I used an enterprise SSD).

The drive that's degraded is always the same one that comes up as degraded, is this a faulty drive? Is there a way I can prove it's faulty to the manufacturer if so?

ZFS unfortunately isn't my forte, so any help would be great. Cheers in advance!
 
Last edited:
I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?
 
I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?
I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!
 
I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!
Run smartctl -t long on that drive and wait until it is finished (it will return immediately so you'll have to check once in a while) , if you want the drive to do a long self-test which might provide a concrete error by the drive itself.
 
  • Like
Reactions: modem7
  • Like
Reactions: modem7