zpool mirror faulted

Romainp · Feb 2, 2024

Hello!
So I would like to have your input about the issue I have. I have a PBS 3.1 setup with those disks:

Code:

NAME         FSTYPE      FSVER    LABEL  UUID                                   FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1
├─sda2       vfat        FAT32           399C-565D                               510.7M     0% /boot/efi
└─sda3       LVM2_member LVM2 001        kG1e8u-qINS-TtPR-Mqvu-bVlj-II68-VBEEYF
  ├─pbs-swap swap        1               0d6b2516-fc82-4a58-8c27-d62a039f88c4                  [SWAP]
  └─pbs-root ext4        1.0             c79ccb95-242e-4bfd-a38f-3b4332001c40    376.4G     4% /
sdb
├─sdb1       zfs_member  5000     pool01 9173327591722584431
└─sdb9
sdc
├─sdc1       zfs_member  5000     pool01 9173327591722584431
└─sdc9
sdd
├─sdd1       zfs_member  5000     pool01 9173327591722584431
└─sdd9
sde
├─sde1       zfs_member  5000     pool01 9173327591722584431
└─sde9
sdf
├─sdf1       zfs_member  5000     pool01 9173327591722584431
└─sdf9
sdg
├─sdg1       zfs_member  5000     pool01 9173327591722584431
└─sdg9
sdh
├─sdh1       zfs_member  5000     pool01 9173327591722584431
└─sdh9
sdi
├─sdi1       zfs_member  5000     pool01 9173327591722584431
└─sdi9
sdj
├─sdj1       zfs_member  5000     pool01 9173327591722584431
└─sdj9
sdk
├─sdk1       zfs_member  5000     pool01 9173327591722584431
└─sdk9
sdl
├─sdl1       zfs_member  5000     pool01 9173327591722584431
└─sdl9
sdm
├─sdm1       zfs_member  5000     pool01 9173327591722584431
└─sdm9
sdn
├─sdn1       zfs_member  5000     pool01 9173327591722584431
└─sdn9
sdo
├─sdo1       zfs_member  5000     pool01 9173327591722584431
└─sdo9
sdp
├─sdp1       zfs_member  5000     pool01 9173327591722584431
└─sdp9
├─sdq1       zfs_member  5000     pool01 9173327591722584431
└─sdq9
sdr
├─sdr1       zfs_member  5000     pool01 9173327591722584431
└─sdr9
sds
├─sds1       zfs_member  5000     pool01 9173327591722584431
└─sds9
sdt
├─sdt1       zfs_member  5000     pool01 9173327591722584431
└─sdt9
sdu
├─sdu1       zfs_member  5000     pool01 9173327591722584431
└─sdu9
sdv
├─sdv1       zfs_member  5000     pool01 9173327591722584431
└─sdv9
sdw
├─sdw1       zfs_member  5000     pool01 9173327591722584431
└─sdw9
sdx
├─sdx1       zfs_member  5000     pool01 9173327591722584431
└─sdx9
sdy
├─sdy1       zfs_member  5000     pool01 9173327591722584431
└─sdy9

All the disks are ok:

Code:

root@srv4:~# ssacli ctrl slot=3 pd all show


Smart Array P440 in Slot 3 (HBA Mode)


   HBA Drives


      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:13 (port 1I:box 1:bay 13, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:14 (port 1I:box 1:bay 14, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:15 (port 1I:box 1:bay 15, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:16 (port 1I:box 1:bay 16, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:17 (port 1I:box 1:bay 17, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:18 (port 1I:box 1:bay 18, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:19 (port 1I:box 1:bay 19, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:20 (port 1I:box 1:bay 20, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:21 (port 1I:box 1:bay 21, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:22 (port 1I:box 1:bay 22, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:23 (port 1I:box 1:bay 23, SAS HDD, 1.8 TB, OK)
      physicaldrive 1I:1:24 (port 1I:box 1:bay 24, SAS HDD, 1.8 TB, OK

)

After a reboot I got some morror issues (no big deal I have a synched pbs for those backup)

Code:

root@srv4:~# zpool status -x
  pool: pool01
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 672M in 00:04:14 with 107062 errors on Wed Jan 31 18:02:53 2024
config:


        NAME                      STATE     READ WRITE CKSUM
        pool01                    DEGRADED     0     0     0
          mirror-0                ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0
          mirror-1                ONLINE       0     0     0
            sdd                   ONLINE       0     0     0
            sde                   ONLINE       0     0     0
          mirror-2                ONLINE       0     0     0
            sdf                   ONLINE       0     0     0
            sdg                   ONLINE       0     0     0
          mirror-3                ONLINE       0     0     0
            sdh                   ONLINE       0     0     0
            sdi                   ONLINE       0     0     0
          mirror-4                DEGRADED  105K     0     0
            sdj                   DEGRADED     0     0  209K  too many errors
            8662495910803711342   FAULTED      0     0     0  was /dev/sdj1
          mirror-5                DEGRADED     0     0     0
            15846917852308872208  FAULTED      0     0     0  was /dev/sdk1
            sdm                   ONLINE       0     0     0
          mirror-6                ONLINE       0     0     0
            sdn                   ONLINE       0     0     0
            sdo                   ONLINE       0     0     0
          mirror-7                ONLINE       0     0     0
            sdp                   ONLINE       0     0     0
            sdq                   ONLINE       0     0     0
          mirror-8                ONLINE       0     0     0
            sdr                   ONLINE       0     0     0
            sds                   ONLINE       0     0     0
          mirror-9                ONLINE       0     0     0
            sdt                   ONLINE       0     0     0
            sdu                   ONLINE       0     0     0
          mirror-10               ONLINE       0     0     0
            sdv                   ONLINE       0     0     0
            sdw                   ONLINE       0     0     0
          mirror-11               ONLINE       0     0     0
            sdx                   ONLINE       0     0     0
            sdy                   ONLINE       0     0     0

Problem is that I am not sure to do at this point..
* Seems that the reboot "renamed" some disks so a reboot may have changed the disk name (only a guess)
* Not sure about what disk has been rename/lost in the config, like how to identify the new "was /dev/sdj1" ??
* What could be the steps to fix this if possible?
* Since this naming is not really good some post onm the internet mention to use the dev-by-id naming but how can I do this?

Any help will be more than welcome

Dunuin · Feb 2, 2024

Romainp said:
* Seems that the reboot "renamed" some disks so a reboot may have changed the disk name (only a guess)

ZFS doesn't care about that. It identifies the disks by the metadata stored on the disks. So changed names shouldn't be a problem.

Romainp said:
What could be the steps to fix this if possible?

Your striped mirror got 2 mirrors with missing/failed disks. And of one of that mirrors the remaining disk got 210.000 corrupted records. Once a mirror is running on only a single disk every error won't be recoverable. So you best follow what ZFS is telling you:

status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.

Romainp · Feb 2, 2024

Thanks for the reply

Ok so I will rebuild the pool then. Thanks for the advices!

Dunuin · Feb 2, 2024

By the way...its also possible to stripe 3-disk mirrors for additional reliability so ZFS could repair currupted records even if a disk of that mirror completely failed. But you will of cause lose some capacity.

Search

Search

zpool mirror faulted

Romainp

Active Member

Dunuin

Distinguished Member

Romainp

Active Member

Dunuin

Distinguished Member