ZFS RAIDZ1 fails with 1 drive lost.

belrpr

Member
Aug 9, 2017
48
7
6
40
Hi,

My zfs pool is unavailble
It was a 10 drive RAIDZ1

This happened when a disk failed.
Trying to import it gives the following error:
Code:
zpool import
pool: RAIDZ
id: 18225548103796699372
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://zfsonlinux.org/msg/ZFS-8000-6X
config:

RAIDZ UNAVAIL missing device
ata-ST2000DM006-2DM164_W4Z4KQ5D ONLINE
ata-ST2000DM006-2DM164_W4Z4JN3W ONLINE
ata-ST2000DM006-2DM164_W4Z4KFVP ONLINE
ata-ST2000DM006-2DM164_W4Z4JS6E ONLINE
ata-ST2000DM006-2DM164_W4Z4KQ7P ONLINE
ata-ST2000DM006-2DM164_W4Z4KCYZ ONLINE
ata-ST2000DM006-2DM164_W4Z4KQZS ONLINE
ata-ST2000DM006-2DM164_W4Z4KQ6Y ONLINE
ata-ST2000DM006-2DM164_W4Z4KQ38 ONLINE

Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.
Anyone know how I could fix this so I can the pool back online and change the defective drive?
 
Have you tried using -F -n to check if pool can be made importable again? Did you have a separate log device? In that case try using -m
 
zpool import -d /dev/disk/by-id/ -f RAIDZ -F -n
result in no error but still no pool available (zpool status)
 
If the output of zpool import -F -n is OK, you should try the import without -n
With -n zpool only checks if a damaged pool can be imported, but make no changes to import it.
 
Then I got:
root@FS-PRX-01-17:~# zpool import -d /dev/disk/by-id/ -f RAIDZ -F
cannot import 'RAIDZ': one or more devices is currently unavailable
 
Have you failed disk with you?

I just tested raidz with files and importing pool with missing one file from pool i get

Code:
  pool: test
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    test                      DEGRADED     0     0     0
      raidz1-0                DEGRADED     0     0     0
        10232428910579234279  UNAVAIL      0     0     0  was /mnt/a
        /mnt/b                ONLINE       0     0     0
        /mnt/c                ONLINE       0     0     0
        /mnt/d                ONLINE       0     0     0
        /mnt/f                ONLINE       0     0     0

Your pool state is UNAVAIL

If you can plug in missing disk
 
I got the disk but it doesn't start anymore. I hear the disk starting to spin and then 2 loud clicks of the arm hitting the platters. The drive spin down.
 
This is the result.
Code:
root@FS-PRX-01-17:~# zdb -e -p /dev/disk/by-id/ RAIDZ

Configuration for import:
        vdev_children: 10
        version: 5000
        pool_guid: 18225548103796699372
        name: 'RAIDZ'
        state: 0
        hostid: 2831157250
        hostname: 'FS-PRX-01-17'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 18225548103796699372
            children[0]:
                type: 'disk'
                id: 0
                guid: 1309946069150283784
                whole_disk: 1
                metaslab_array: 45
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 966
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KQ5D-part1'
            children[1]:
                type: 'disk'
                id: 1
                guid: 6840672152481829192
                whole_disk: 1
                metaslab_array: 43
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 965
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4JN3W-part1'
            children[2]:
                type: 'missing'
                id: 2
                guid: 0
            children[3]:
                type: 'disk'
                id: 3
                guid: 553431340593160644
                whole_disk: 1
                metaslab_array: 41
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 968
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KFVP-part1'
            children[4]:
                type: 'disk'
                id: 4
                guid: 16629211682757826694
                whole_disk: 1
                metaslab_array: 40
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 963
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4JS6E-part1'
            children[5]:
                type: 'disk'
                id: 5
                guid: 11236831328210821342
                whole_disk: 1
                metaslab_array: 39
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 959
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KQ7P-part1'
            children[6]:
                type: 'disk'
                id: 6
                guid: 15613124320463655961
                whole_disk: 1
                metaslab_array: 38
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 962
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KCYZ-part1'
            children[7]:
                type: 'disk'
                id: 7
                guid: 7066021265764877190
                whole_disk: 1
                metaslab_array: 37
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 961
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KQZS-part1'
            children[8]:
                type: 'disk'
                id: 8
                guid: 17507335049202070049
                whole_disk: 1
                metaslab_array: 36
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 960
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KQ6Y-part1'
            children[9]:
                type: 'disk'
                id: 9
                guid: 7405920600437465982
                whole_disk: 1
                metaslab_array: 34
                metaslab_shift: 34
                ashift: 12
                asize: 2000384688128
                is_log: 0
                DTL: 967
                create_txg: 4
                path: '/dev/disk/by-id/ata-ST2000DM006-2DM164_W4Z4KQ38-part1'
zdb: can't open 'RAIDZ': No such device or address
 
Yeah I know but this made me doubt zfs.
Can't get it why it would do this.

you don't have (/didn't have) a raidz1 pool, but a striped one (aka raid0) - which means a single lost disk makes the whole pool go unavailable.

a (simple, non-striped) raidz would have 1 top level child of type raidz:
Code:
$ zdb -e testpool
Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 6487781493759958787
        name: 'testpool'
        state: 1
        hostid: 2831157250
        hostname: 'host'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6487781493759958787
            children[0]:
                type: 'raidz'
                id: 0
                guid: 9745755279438317906
                nparity: 1
                metaslab_array: 256
                metaslab_shift: 28
                ashift: 9
                asize: 42930798592
                is_log: 0
                create_txg: 4

which then has the actual disks as children itself.