ZFS raidz2 "insufficient replicas"

alpha754293

Member
Jan 8, 2023
96
18
13
On my Proxmox 7.4-3 system, I have an 8-wide raidz2 ZFS pool consisting of eight 6 TB HGST HDDs.

One of the drives was reporting state "FAULTED" due to too many errors, and according to the drive activity lights, it wasn't being used by ZFS anymore.

I replaced the drive with a cold spare that I had and issued this command:

# zpool replace export_pve ata-HGST_HDN726060ALE614_K8GZ103D ata-HGST_HUS726060ALE610_NCHANBUS -f

And it starts working on replacing the disk.

After a while though, this is what it shows:
Code:
# zpool status export_pve
  pool: export_pve
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu May 25 03:13:56 2023
        315M scanned at 9.55M/s, 111M issued at 3.36M/s, 13.7T total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                     STATE     READ WRITE CKSUM
        export_pve                               DEGRADED     0     0     0
          raidz2-0                               DEGRADED     0     0     0
            ata-HGST_HUS726060ALE610_NCHAJ3US    ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_K8GYJNDD    ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_K1KK32XD    ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_K8GYTBEN    ONLINE       0     0     0
            replacing-4                          UNAVAIL      0   202     0  insufficient replicas
              ata-HGST_HDN726060ALE614_K8GZ103D  REMOVED      0     0     0
              ata-HGST_HUS726060ALE610_NCHANBUS  REMOVED      0     0     0
            ata-HGST_HDN726060ALE614_K1HZ8Y9D    ONLINE       0     0     0
            ata-HGST_HUS726060ALE610_NCHALTBS    ONLINE       0     0     0
            ata-HGST_HDN726060ALE610_NCGU5B6S    ONLINE       0     0     0

For a raidz2 array, which is supposed to be two-drive fault tolerant, why would I get an "insufficient replicas" comment when I am trying to replace a failed disk?

I tried googling this issue, but either people were using ZFS mirrors and/or people were using raidz1s. There weren't very many results where people were using raidz2.

Some of the results, people were talking about how they had rebooted their system and/or powered it down, did their drive placement, and then powered their system back up, and it was failing to do the ZFS import during boot up, but I am trying to make sure that I don't reboot nor power down my system, JUST in case of something like that happening or otherwise resulting in that mode of failure.

I get the impression that either I did something wrong or the system is behaving in a way that's different than what I would have otherwise expected.

Any insights into this error would be greatly appreciated.

Thank you.