Hi,
I am currently gaining a little experience with PVE and ZFS.
So I connected an old server with NetAPP hardware and set up everything according to ZFS recommendations (HBA, etc.).
02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
When I start the system without ZFS, the disks appear as follows (I.e. the disks are recognized m Diskshelf.):
If I then set up a ZFS it looks fine BEFORE the reboot:
After the reboot selsamterly like this:
In the early boot stage the disks are recognized as usual, e.g. sdc, sdd. The ZFS starts and right now strange things happen, so the drives are remapped "without enclosure" and get a new device name. Maybe this reconnect makes ZFS fail:
dmesg with ZFS and renaming devices:
https://pastebin.com/erzdhrVE
When I disband the ZFS again everything appears "normal" again.
dmesg without zfs, all normal:
https://pastebin.com/4B4cngQU
Any ideas?
Thanks for your help!
ramon
I am currently gaining a little experience with PVE and ZFS.
So I connected an old server with NetAPP hardware and set up everything according to ZFS recommendations (HBA, etc.).
02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
When I start the system without ZFS, the disks appear as follows (I.e. the disks are recognized m Diskshelf.):
Code:
# lsblk -d -o VENDOR,MODEL,SERIAL,WWN,HCTL,SIZE,PHY-SEC,LOG-SEC,NAME | sed -e "`ls -1d /sys/class/enclosure/*/*/device/block/*|sed "s+.*enclosure/\(.*\)/device/block/\(.*\)+s-\2\\$-\2 \1-+"`" | grep -i netapp
NETAPP X477_SMEGX04TA07 S1Z2ZM13 0x5000c50099acb2b3 1:0:0:0 3.6T 512 512 sdc 1:0:24:0/0
NETAPP X477_SMEGX04TA07 S1Z2ZLQ0 0x5000c50099acd897 1:0:1:0 3.6T 512 512 sdd 1:0:24:0/1
NETAPP X477_SMEGX04TA07 S1Z2ZFGP 0x5000c50099b03c9f 1:0:2:0 3.6T 512 512 sde 1:0:24:0/2
NETAPP X477_SMEGX04TA07 S1Z2ZGJ4 0x5000c50099afd167 1:0:3:0 3.6T 512 512 sdf 1:0:24:0/3
NETAPP X477_SMEGX04TA07 S1Z2ZFGY 0x5000c50099b06867 1:0:4:0 3.6T 512 512 sdg 1:0:24:0/4
NETAPP X477_SMEGX04TA07 S1Z2ZLTS 0x5000c50099acc6db 1:0:5:0 3.6T 512 512 sdh 1:0:24:0/5
NETAPP X477_SMEGX04TA07 S1Z2ZFML 0x5000c50099b03067 1:0:6:0 3.6T 512 512 sdi 1:0:24:0/6
NETAPP X477_SMEGX04TA07 S1Z2ZLYY 0x5000c50099acb9fb 1:0:7:0 3.6T 512 512 sdj 1:0:24:0/7
NETAPP X477_SMEGX04TA07 S1Z2ZNHN 0x5000c50099ac1a63 1:0:8:0 3.6T 512 512 sdk 1:0:24:0/8
NETAPP X477_SMEGX04TA07 S1Z2ZFME 0x5000c50099b030cb 1:0:9:0 3.6T 512 512 sdl 1:0:24:0/9
NETAPP X477_SMEGX04TA07 S1Z2ZF7V 0x5000c50099b05013 1:0:10:0 3.6T 512 512 sdm 1:0:24:0/10
NETAPP X477_SMEGX04TA07 S1Z2ZLG2 0x5000c50099acf193 1:0:11:0 3.6T 512 512 sdn 1:0:24:0/11
NETAPP X477_SMEGX04TA07 S1Z2YGVY 0x5000c50099ad1ed7 1:0:12:0 3.6T 512 512 sdo 1:0:24:0/12
NETAPP X477_SMEGX04TA07 S1Z2ZFRJ 0x5000c50099b0060b 1:0:13:0 3.6T 512 512 sdp 1:0:24:0/13
NETAPP X477_SMEGX04TA07 S1Z2ZM0Z 0x5000c50099acb317 1:0:14:0 3.6T 512 512 sdq 1:0:24:0/14
NETAPP X477_SMEGX04TA07 S1Z2ZFDH 0x5000c50099b03fe3 1:0:15:0 3.6T 512 512 sdr 1:0:24:0/15
NETAPP X477_SMEGX04TA07 S1Z2YHCZ 0x5000c50099acf2bf 1:0:16:0 3.6T 512 512 sds 1:0:24:0/16
NETAPP X477_SMEGX04TA07 S1Z2YH4S 0x5000c50099ad183f 1:0:17:0 3.6T 512 512 sdt 1:0:24:0/17
NETAPP X477_SMEGX04TA07 S1Z2ZF6B 0x5000c50099b054ef 1:0:18:0 3.6T 512 512 sdu 1:0:24:0/18
NETAPP X477_SMEGX04TA07 S1Z2ZH23 0x5000c50099afa22b 1:0:19:0 3.6T 512 512 sdv 1:0:24:0/19
NETAPP X477_SMEGX04TA07 S1Z2YFVA 0x5000c50099ae007b 1:0:20:0 3.6T 512 512 sdw 1:0:24:0/20
NETAPP X477_SMEGX04TA07 S1Z2YGVX 0x5000c50099ad1ea7 1:0:21:0 3.6T 512 512 sdx 1:0:24:0/21
NETAPP X477_SMEGX04TA07 S1Z2YGZ5 0x5000c50099ad15cb 1:0:22:0 3.6T 512 512 sdy 1:0:24:0/22
NETAPP X477_SMEGX04TA07 S1Z2YHB5 0x5000c50099acf82f 1:0:23:0 3.6T 512 512 sdz 1:0:24:0/23
If I then set up a ZFS it looks fine BEFORE the reboot:
Code:
# zpool status
pool: NetApp_Shelf_01
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
NetApp_Shelf_01 ONLINE 0 0 0
draid3:4d:24c:1s-0 ONLINE 0 0 0
scsi-35000c50099acb2b3 ONLINE 0 0 0
scsi-35000c50099acd897 ONLINE 0 0 0
scsi-35000c50099b03c9f ONLINE 0 0 0
scsi-35000c50099afd167 ONLINE 0 0 0
scsi-35000c50099b06867 ONLINE 0 0 0
scsi-35000c50099acc6db ONLINE 0 0 0
scsi-35000c50099b03067 ONLINE 0 0 0
scsi-35000c50099acb9fb ONLINE 0 0 0
scsi-35000c50099ac1a63 ONLINE 0 0 0
scsi-35000c50099b030cb ONLINE 0 0 0
scsi-35000c50099b05013 ONLINE 0 0 0
scsi-35000c50099acf193 ONLINE 0 0 0
scsi-35000c50099ad1ed7 ONLINE 0 0 0
scsi-35000c50099b0060b ONLINE 0 0 0
scsi-35000c50099acb317 ONLINE 0 0 0
scsi-35000c50099b03fe3 ONLINE 0 0 0
scsi-35000c50099acf2bf ONLINE 0 0 0
scsi-35000c50099ad183f ONLINE 0 0 0
scsi-35000c50099b054ef ONLINE 0 0 0
scsi-35000c50099afa22b ONLINE 0 0 0
scsi-35000c50099ae007b ONLINE 0 0 0
scsi-35000c50099ad1ea7 ONLINE 0 0 0
scsi-35000c50099ad15cb ONLINE 0 0 0
scsi-35000c50099acf82f ONLINE 0 0 0
spares
draid3-0-0 AVAIL
errors: No known data errors
After the reboot selsamterly like this:
Code:
# lsblk -d -o VENDOR,MODEL,SERIAL,WWN,HCTL,SIZE,PHY-SEC,LOG-SEC,NAME | sed -e "`ls -1d /sys/class/enclosure/*/*/device/block/*|sed "s+.*enclosure/\(.*\)/device/block/\(.*\)+s-\2\\$-\2 \1-+"`" | grep -i netapp
ls: cannot access '/sys/class/enclosure/*/*/device/block/*': No such file or directory
NETAPP X477_SMEGX04TA07 S1Z2ZM13 0x5000c50099acb2b3 1:0:26:0 3.6T 512 512 sdaa
NETAPP X477_SMEGX04TA07 S1Z2ZLQ0 0x5000c50099acd897 1:0:27:0 3.6T 512 512 sdab
NETAPP X477_SMEGX04TA07 S1Z2ZFGP 0x5000c50099b03c9f 1:0:28:0 3.6T 512 512 sdac
NETAPP X477_SMEGX04TA07 S1Z2ZGJ4 0x5000c50099afd167 1:0:29:0 3.6T 512 512 sdad
NETAPP X477_SMEGX04TA07 S1Z2ZFGY 0x5000c50099b06867 1:0:30:0 3.6T 512 512 sdae
NETAPP X477_SMEGX04TA07 S1Z2ZLTS 0x5000c50099acc6db 1:0:31:0 3.6T 512 512 sdaf
NETAPP X477_SMEGX04TA07 S1Z2ZFML 0x5000c50099b03067 1:0:32:0 3.6T 512 512 sdag
NETAPP X477_SMEGX04TA07 S1Z2ZLYY 0x5000c50099acb9fb 1:0:33:0 3.6T 512 512 sdah
NETAPP X477_SMEGX04TA07 S1Z2ZNHN 0x5000c50099ac1a63 1:0:34:0 3.6T 512 512 sdai
NETAPP X477_SMEGX04TA07 S1Z2ZFME 0x5000c50099b030cb 1:0:35:0 3.6T 512 512 sdaj
NETAPP X477_SMEGX04TA07 S1Z2ZF7V 0x5000c50099b05013 1:0:36:0 3.6T 512 512 sdak
NETAPP X477_SMEGX04TA07 S1Z2ZLG2 0x5000c50099acf193 1:0:37:0 3.6T 512 512 sdal
NETAPP X477_SMEGX04TA07 S1Z2YGVY 0x5000c50099ad1ed7 1:0:38:0 3.6T 512 512 sdam
NETAPP X477_SMEGX04TA07 S1Z2ZFRJ 0x5000c50099b0060b 1:0:39:0 3.6T 512 512 sdan
NETAPP X477_SMEGX04TA07 S1Z2ZM0Z 0x5000c50099acb317 1:0:40:0 3.6T 512 512 sdao
NETAPP X477_SMEGX04TA07 S1Z2ZFDH 0x5000c50099b03fe3 1:0:41:0 3.6T 512 512 sdap
NETAPP X477_SMEGX04TA07 S1Z2YHCZ 0x5000c50099acf2bf 1:0:42:0 3.6T 512 512 sdaq
NETAPP X477_SMEGX04TA07 S1Z2YH4S 0x5000c50099ad183f 1:0:43:0 3.6T 512 512 sdar
NETAPP X477_SMEGX04TA07 S1Z2ZF6B 0x5000c50099b054ef 1:0:44:0 3.6T 512 512 sdas
NETAPP X477_SMEGX04TA07 S1Z2ZH23 0x5000c50099afa22b 1:0:45:0 3.6T 512 512 sdat
NETAPP X477_SMEGX04TA07 S1Z2YFVA 0x5000c50099ae007b 1:0:46:0 3.6T 512 512 sdau
NETAPP X477_SMEGX04TA07 S1Z2YGVX 0x5000c50099ad1ea7 1:0:47:0 3.6T 512 512 sdav
NETAPP X477_SMEGX04TA07 S1Z2YGZ5 0x5000c50099ad15cb 1:0:48:0 3.6T 512 512 sdaw
NETAPP X477_SMEGX04TA07 S1Z2YHB5 0x5000c50099acf82f 1:0:49:0 3.6T 512 512 sdax
In the early boot stage the disks are recognized as usual, e.g. sdc, sdd. The ZFS starts and right now strange things happen, so the drives are remapped "without enclosure" and get a new device name. Maybe this reconnect makes ZFS fail:
dmesg with ZFS and renaming devices:
https://pastebin.com/erzdhrVE
Code:
# zpool status -v
pool: NetApp_Shelf_01
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:
NAME STATE READ WRITE CKSUM
NetApp_Shelf_01 ONLINE 0 0 0
raidz3-0 ONLINE 0 24 0
scsi-35000c50099acb2b3 ONLINE 3 6 0
scsi-35000c50099acd897 ONLINE 3 6 0
scsi-35000c50099b03c9f ONLINE 3 6 0
scsi-35000c50099afd167 ONLINE 3 6 0
scsi-35000c50099b06867 ONLINE 3 4 0
scsi-35000c50099acc6db ONLINE 3 4 0
scsi-35000c50099b03067 ONLINE 3 4 0
scsi-35000c50099acb9fb ONLINE 3 4 0
scsi-35000c50099ac1a63 ONLINE 3 2 0
scsi-35000c50099b030cb ONLINE 3 2 0
scsi-35000c50099b05013 ONLINE 3 2 0
scsi-35000c50099acf193 ONLINE 3 2 0
scsi-35000c50099ad1ed7 ONLINE 3 4 0
scsi-35000c50099b0060b ONLINE 3 4 0
scsi-35000c50099acb317 ONLINE 3 4 0
scsi-35000c50099b03fe3 ONLINE 3 4 0
scsi-35000c50099acf2bf ONLINE 3 6 0
scsi-35000c50099ad183f ONLINE 3 6 0
scsi-35000c50099b054ef ONLINE 3 6 0
scsi-35000c50099afa22b ONLINE 3 6 0
scsi-35000c50099ae007b ONLINE 3 2 0
scsi-35000c50099ad1ea7 ONLINE 3 2 0
scsi-35000c50099ad15cb ONLINE 3 2 0
scsi-35000c50099acf82f ONLINE 3 2 0
errors: List of errors unavailable: pool I/O is currently suspended
When I disband the ZFS again everything appears "normal" again.
dmesg without zfs, all normal:
https://pastebin.com/4B4cngQU
Any ideas?
Thanks for your help!
ramon