[SOLVED] ZFS pool Degraded

modem7

Member
Nov 2, 2021
34
2
13
38
Hey guys,

I've got a warning for degraded ZFS array, and I'm just curious as to what's causing it.

1689070945533.png

1689070955151.png

1689070963326.png

Code:
zpool status -v proxmox
  pool: proxmox
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:08:30 with 0 errors on Tue Jul 11 11:26:25 2023
config:

        NAME                                  STATE     READ WRITE CKSUM
        proxmox                               DEGRADED     0     0     0
          mirror-0                            DEGRADED     0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx1  ONLINE       0     0     0
            ata-CT1000MX500SSD1_xxxxxxxxxxx2  DEGRADED    12     2    49  too many errors

errors: No known data errors

I've changed the data cables (not tried changing power cables), changed the ports to a different controller on the motherboard, and if I do a zpool clear the warning goes away for a couple of weeks until it comes back.

SMART test passes, and the SSDs are relatively new (yes, they're consumer, so I know I'll need to replace them sooner than if I used an enterprise SSD).

The drive that's degraded is always the same one that comes up as degraded, is this a faulty drive? Is there a way I can prove it's faulty to the manufacturer if so?

ZFS unfortunately isn't my forte, so any help would be great. Cheers in advance!
 
Last edited:
I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?
 
I think you disk is not doing too good. In your screenshots it reports "too many errors"

what does smartctl -a /dev/sdX say?
I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!
 
I think you're right - I just noticed the CRC error count on the smart report above.

Will get it replaced. Thanks for the 2nd pair of eyes!
Run smartctl -t long on that drive and wait until it is finished (it will return immediately so you'll have to check once in a while) , if you want the drive to do a long self-test which might provide a concrete error by the drive itself.
 
  • Like
Reactions: modem7
  • Like
Reactions: modem7

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!