Fresh 8.0.3 install renames disk names when ZFS is used and crashes on reboot - weird


Nov 22, 2021

I am currently gaining a little experience with PVE and ZFS.

So I connected an old server with NetAPP hardware and set up everything according to ZFS recommendations (HBA, etc.).

02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

When I start the system without ZFS, the disks appear as follows (I.e. the disks are recognized m Diskshelf.):

# lsblk -d -o VENDOR,MODEL,SERIAL,WWN,HCTL,SIZE,PHY-SEC,LOG-SEC,NAME | sed -e "`ls -1d /sys/class/enclosure/*/*/device/block/*|sed "s+.*enclosure/\(.*\)/device/block/\(.*\)+s-\2\\$-\2 \1-+"`" | grep -i netapp
NETAPP   X477_SMEGX04TA07      S1Z2ZM13                         0x5000c50099acb2b3                 1:0:0:0      3.6T     512     512 sdc 1:0:24:0/0
NETAPP   X477_SMEGX04TA07      S1Z2ZLQ0                         0x5000c50099acd897                 1:0:1:0      3.6T     512     512 sdd 1:0:24:0/1
NETAPP   X477_SMEGX04TA07      S1Z2ZFGP                         0x5000c50099b03c9f                 1:0:2:0      3.6T     512     512 sde 1:0:24:0/2
NETAPP   X477_SMEGX04TA07      S1Z2ZGJ4                         0x5000c50099afd167                 1:0:3:0      3.6T     512     512 sdf 1:0:24:0/3
NETAPP   X477_SMEGX04TA07      S1Z2ZFGY                         0x5000c50099b06867                 1:0:4:0      3.6T     512     512 sdg 1:0:24:0/4
NETAPP   X477_SMEGX04TA07      S1Z2ZLTS                         0x5000c50099acc6db                 1:0:5:0      3.6T     512     512 sdh 1:0:24:0/5
NETAPP   X477_SMEGX04TA07      S1Z2ZFML                         0x5000c50099b03067                 1:0:6:0      3.6T     512     512 sdi 1:0:24:0/6
NETAPP   X477_SMEGX04TA07      S1Z2ZLYY                         0x5000c50099acb9fb                 1:0:7:0      3.6T     512     512 sdj 1:0:24:0/7
NETAPP   X477_SMEGX04TA07      S1Z2ZNHN                         0x5000c50099ac1a63                 1:0:8:0      3.6T     512     512 sdk 1:0:24:0/8
NETAPP   X477_SMEGX04TA07      S1Z2ZFME                         0x5000c50099b030cb                 1:0:9:0      3.6T     512     512 sdl 1:0:24:0/9
NETAPP   X477_SMEGX04TA07      S1Z2ZF7V                         0x5000c50099b05013                 1:0:10:0     3.6T     512     512 sdm 1:0:24:0/10
NETAPP   X477_SMEGX04TA07      S1Z2ZLG2                         0x5000c50099acf193                 1:0:11:0     3.6T     512     512 sdn 1:0:24:0/11
NETAPP   X477_SMEGX04TA07      S1Z2YGVY                         0x5000c50099ad1ed7                 1:0:12:0     3.6T     512     512 sdo 1:0:24:0/12
NETAPP   X477_SMEGX04TA07      S1Z2ZFRJ                         0x5000c50099b0060b                 1:0:13:0     3.6T     512     512 sdp 1:0:24:0/13
NETAPP   X477_SMEGX04TA07      S1Z2ZM0Z                         0x5000c50099acb317                 1:0:14:0     3.6T     512     512 sdq 1:0:24:0/14
NETAPP   X477_SMEGX04TA07      S1Z2ZFDH                         0x5000c50099b03fe3                 1:0:15:0     3.6T     512     512 sdr 1:0:24:0/15
NETAPP   X477_SMEGX04TA07      S1Z2YHCZ                         0x5000c50099acf2bf                 1:0:16:0     3.6T     512     512 sds 1:0:24:0/16
NETAPP   X477_SMEGX04TA07      S1Z2YH4S                         0x5000c50099ad183f                 1:0:17:0     3.6T     512     512 sdt 1:0:24:0/17
NETAPP   X477_SMEGX04TA07      S1Z2ZF6B                         0x5000c50099b054ef                 1:0:18:0     3.6T     512     512 sdu 1:0:24:0/18
NETAPP   X477_SMEGX04TA07      S1Z2ZH23                         0x5000c50099afa22b                 1:0:19:0     3.6T     512     512 sdv 1:0:24:0/19
NETAPP   X477_SMEGX04TA07      S1Z2YFVA                         0x5000c50099ae007b                 1:0:20:0     3.6T     512     512 sdw 1:0:24:0/20
NETAPP   X477_SMEGX04TA07      S1Z2YGVX                         0x5000c50099ad1ea7                 1:0:21:0     3.6T     512     512 sdx 1:0:24:0/21
NETAPP   X477_SMEGX04TA07      S1Z2YGZ5                         0x5000c50099ad15cb                 1:0:22:0     3.6T     512     512 sdy 1:0:24:0/22
NETAPP   X477_SMEGX04TA07      S1Z2YHB5                         0x5000c50099acf82f                 1:0:23:0     3.6T     512     512 sdz 1:0:24:0/23

If I then set up a ZFS it looks fine BEFORE the reboot:
# zpool status
  pool: NetApp_Shelf_01
 state: ONLINE

        NAME                        STATE     READ WRITE CKSUM
        NetApp_Shelf_01             ONLINE       0     0     0
          draid3:4d:24c:1s-0        ONLINE       0     0     0
            scsi-35000c50099acb2b3  ONLINE       0     0     0
            scsi-35000c50099acd897  ONLINE       0     0     0
            scsi-35000c50099b03c9f  ONLINE       0     0     0
            scsi-35000c50099afd167  ONLINE       0     0     0
            scsi-35000c50099b06867  ONLINE       0     0     0
            scsi-35000c50099acc6db  ONLINE       0     0     0
            scsi-35000c50099b03067  ONLINE       0     0     0
            scsi-35000c50099acb9fb  ONLINE       0     0     0
            scsi-35000c50099ac1a63  ONLINE       0     0     0
            scsi-35000c50099b030cb  ONLINE       0     0     0
            scsi-35000c50099b05013  ONLINE       0     0     0
            scsi-35000c50099acf193  ONLINE       0     0     0
            scsi-35000c50099ad1ed7  ONLINE       0     0     0
            scsi-35000c50099b0060b  ONLINE       0     0     0
            scsi-35000c50099acb317  ONLINE       0     0     0
            scsi-35000c50099b03fe3  ONLINE       0     0     0
            scsi-35000c50099acf2bf  ONLINE       0     0     0
            scsi-35000c50099ad183f  ONLINE       0     0     0
            scsi-35000c50099b054ef  ONLINE       0     0     0
            scsi-35000c50099afa22b  ONLINE       0     0     0
            scsi-35000c50099ae007b  ONLINE       0     0     0
            scsi-35000c50099ad1ea7  ONLINE       0     0     0
            scsi-35000c50099ad15cb  ONLINE       0     0     0
            scsi-35000c50099acf82f  ONLINE       0     0     0
          draid3-0-0                AVAIL

errors: No known data errors

After the reboot selsamterly like this:
# lsblk -d -o VENDOR,MODEL,SERIAL,WWN,HCTL,SIZE,PHY-SEC,LOG-SEC,NAME | sed -e "`ls -1d /sys/class/enclosure/*/*/device/block/*|sed "s+.*enclosure/\(.*\)/device/block/\(.*\)+s-\2\\$-\2 \1-+"`" | grep -i netapp
ls: cannot access '/sys/class/enclosure/*/*/device/block/*': No such file or directory
NETAPP   X477_SMEGX04TA07      S1Z2ZM13                         0x5000c50099acb2b3                 1:0:26:0     3.6T     512     512 sdaa
NETAPP   X477_SMEGX04TA07      S1Z2ZLQ0                         0x5000c50099acd897                 1:0:27:0     3.6T     512     512 sdab
NETAPP   X477_SMEGX04TA07      S1Z2ZFGP                         0x5000c50099b03c9f                 1:0:28:0     3.6T     512     512 sdac
NETAPP   X477_SMEGX04TA07      S1Z2ZGJ4                         0x5000c50099afd167                 1:0:29:0     3.6T     512     512 sdad
NETAPP   X477_SMEGX04TA07      S1Z2ZFGY                         0x5000c50099b06867                 1:0:30:0     3.6T     512     512 sdae
NETAPP   X477_SMEGX04TA07      S1Z2ZLTS                         0x5000c50099acc6db                 1:0:31:0     3.6T     512     512 sdaf
NETAPP   X477_SMEGX04TA07      S1Z2ZFML                         0x5000c50099b03067                 1:0:32:0     3.6T     512     512 sdag
NETAPP   X477_SMEGX04TA07      S1Z2ZLYY                         0x5000c50099acb9fb                 1:0:33:0     3.6T     512     512 sdah
NETAPP   X477_SMEGX04TA07      S1Z2ZNHN                         0x5000c50099ac1a63                 1:0:34:0     3.6T     512     512 sdai
NETAPP   X477_SMEGX04TA07      S1Z2ZFME                         0x5000c50099b030cb                 1:0:35:0     3.6T     512     512 sdaj
NETAPP   X477_SMEGX04TA07      S1Z2ZF7V                         0x5000c50099b05013                 1:0:36:0     3.6T     512     512 sdak
NETAPP   X477_SMEGX04TA07      S1Z2ZLG2                         0x5000c50099acf193                 1:0:37:0     3.6T     512     512 sdal
NETAPP   X477_SMEGX04TA07      S1Z2YGVY                         0x5000c50099ad1ed7                 1:0:38:0     3.6T     512     512 sdam
NETAPP   X477_SMEGX04TA07      S1Z2ZFRJ                         0x5000c50099b0060b                 1:0:39:0     3.6T     512     512 sdan
NETAPP   X477_SMEGX04TA07      S1Z2ZM0Z                         0x5000c50099acb317                 1:0:40:0     3.6T     512     512 sdao
NETAPP   X477_SMEGX04TA07      S1Z2ZFDH                         0x5000c50099b03fe3                 1:0:41:0     3.6T     512     512 sdap
NETAPP   X477_SMEGX04TA07      S1Z2YHCZ                         0x5000c50099acf2bf                 1:0:42:0     3.6T     512     512 sdaq
NETAPP   X477_SMEGX04TA07      S1Z2YH4S                         0x5000c50099ad183f                 1:0:43:0     3.6T     512     512 sdar
NETAPP   X477_SMEGX04TA07      S1Z2ZF6B                         0x5000c50099b054ef                 1:0:44:0     3.6T     512     512 sdas
NETAPP   X477_SMEGX04TA07      S1Z2ZH23                         0x5000c50099afa22b                 1:0:45:0     3.6T     512     512 sdat
NETAPP   X477_SMEGX04TA07      S1Z2YFVA                         0x5000c50099ae007b                 1:0:46:0     3.6T     512     512 sdau
NETAPP   X477_SMEGX04TA07      S1Z2YGVX                         0x5000c50099ad1ea7                 1:0:47:0     3.6T     512     512 sdav
NETAPP   X477_SMEGX04TA07      S1Z2YGZ5                         0x5000c50099ad15cb                 1:0:48:0     3.6T     512     512 sdaw
NETAPP   X477_SMEGX04TA07      S1Z2YHB5                         0x5000c50099acf82f                 1:0:49:0     3.6T     512     512 sdax

In the early boot stage the disks are recognized as usual, e.g. sdc, sdd. The ZFS starts and right now strange things happen, so the drives are remapped "without enclosure" and get a new device name. Maybe this reconnect makes ZFS fail:

dmesg with ZFS and renaming devices:

# zpool status -v
  pool: NetApp_Shelf_01
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.

        NAME                        STATE     READ WRITE CKSUM
        NetApp_Shelf_01             ONLINE       0     0     0
          raidz3-0                  ONLINE       0    24     0
            scsi-35000c50099acb2b3  ONLINE       3     6     0
            scsi-35000c50099acd897  ONLINE       3     6     0
            scsi-35000c50099b03c9f  ONLINE       3     6     0
            scsi-35000c50099afd167  ONLINE       3     6     0
            scsi-35000c50099b06867  ONLINE       3     4     0
            scsi-35000c50099acc6db  ONLINE       3     4     0
            scsi-35000c50099b03067  ONLINE       3     4     0
            scsi-35000c50099acb9fb  ONLINE       3     4     0
            scsi-35000c50099ac1a63  ONLINE       3     2     0
            scsi-35000c50099b030cb  ONLINE       3     2     0
            scsi-35000c50099b05013  ONLINE       3     2     0
            scsi-35000c50099acf193  ONLINE       3     2     0
            scsi-35000c50099ad1ed7  ONLINE       3     4     0
            scsi-35000c50099b0060b  ONLINE       3     4     0
            scsi-35000c50099acb317  ONLINE       3     4     0
            scsi-35000c50099b03fe3  ONLINE       3     4     0
            scsi-35000c50099acf2bf  ONLINE       3     6     0
            scsi-35000c50099ad183f  ONLINE       3     6     0
            scsi-35000c50099b054ef  ONLINE       3     6     0
            scsi-35000c50099afa22b  ONLINE       3     6     0
            scsi-35000c50099ae007b  ONLINE       3     2     0
            scsi-35000c50099ad1ea7  ONLINE       3     2     0
            scsi-35000c50099ad15cb  ONLINE       3     2     0
            scsi-35000c50099acf82f  ONLINE       3     2     0

errors: List of errors unavailable: pool I/O is currently suspended

When I disband the ZFS again everything appears "normal" again.

dmesg without zfs, all normal:

Any ideas?

Thanks for your help!
