Hi all,
what happened?
A PVE server commissioned in 2018 ceased service after almost 2 years of trouble-free continuous operation; server access was no longer possible, neither PVE-GUI nor other services; ping was still possible.
A login to the physical console was still successful. The subsequent shutdown got stuck on shutdown of the VMs…

Now only a reset remained as an option.
The subsequent boot process will hang when the root system is importing rpool.

Shortly after that a lot of kernel messages appear every two minutes:

After an initial hardware check, a hardware problem can be ruled out.
rpool is on a mirror of 2x SSD Samsung 850 Pro 1TB, another pool on a raidz2 with 4x 4TB HDD, no ZIL, no L2ARC.
CPU: AMD FX-8350
RAM: 32GB DDR3 ECC
Proxmox, probably PVE 5.2
Kernel: 4.15.18-21-pve
The SSDs are okay according to
The big problem: rpool can no longer be imported in any way that I know of, even though a
A
No other behavior:
This was tested on different hardware under different kernel versions – last under the current PVE-6.2-1
The following message is conspicuous:
I have found absolutely no clues on the net in connection with ZFS.
Have we come across a new bug in the ZFS universe here?
Who has a helpful idea to get the server up and running again?

what happened?
A PVE server commissioned in 2018 ceased service after almost 2 years of trouble-free continuous operation; server access was no longer possible, neither PVE-GUI nor other services; ping was still possible.
A login to the physical console was still successful. The subsequent shutdown got stuck on shutdown of the VMs…

Now only a reset remained as an option.
The subsequent boot process will hang when the root system is importing rpool.

Shortly after that a lot of kernel messages appear every two minutes:

After an initial hardware check, a hardware problem can be ruled out.
rpool is on a mirror of 2x SSD Samsung 850 Pro 1TB, another pool on a raidz2 with 4x 4TB HDD, no ZIL, no L2ARC.
CPU: AMD FX-8350
RAM: 32GB DDR3 ECC
Proxmox, probably PVE 5.2
Kernel: 4.15.18-21-pve
The SSDs are okay according to
smartctl
.The big problem: rpool can no longer be imported in any way that I know of, even though a
zpool import
shows an intact pool:
Code:
# zpool import
pool: rpool
id: 9373167444002024865
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: http://zfsonlinux.org/msg/ZFS-8000-EY
config:
rpool ONLINE
mirror-0 ONLINE
sda2 ONLINE
sdc2 ONLINE
A
zpool import -f rpool
does not end with a normal error message, but instead triggers a kernel panic:May 11 13:12:38 pve-resc kernel: [ 5314.207156] PANIC: blkptr at 000000004ab2be1f has invalid TYPE 140
May 11 13:12:38 pve-resc kernel: [ 5314.207161] Showing stack for process 24099
May 11 13:12:38 pve-resc kernel: [ 5314.207168] CPU: 2 PID: 24099 Comm: txg_sync Tainted: P O 5.0.0-32-generic #34~18.04.2-Ubuntu
May 11 13:12:38 pve-resc kernel: [ 5314.207169] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
May 11 13:12:38 pve-resc kernel: [ 5314.207171] Call Trace:
May 11 13:12:38 pve-resc kernel: [ 5314.207180] dump_stack+0x63/0x85
May 11 13:12:38 pve-resc kernel: [ 5314.207194] spl_dumpstack+0x42/0x50 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.207202] vcmn_err+0xc3/0x100 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.207208] ? _cond_resched+0x19/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207212] ? __kmalloc+0x62/0x210
May 11 13:12:38 pve-resc kernel: [ 5314.207215] ? sg_kmalloc+0x19/0x30
May 11 13:12:38 pve-resc kernel: [ 5314.207217] ? sg_init_table+0x15/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207219] ? __sg_alloc_table+0x9b/0x160
May 11 13:12:38 pve-resc kernel: [ 5314.207220] ? sg_zero_buffer+0xc0/0xc0
May 11 13:12:38 pve-resc kernel: [ 5314.207307] zfs_panic_recover+0x69/0x90 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207346] ? abd_alloc+0x2cd/0x480 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207386] ? arc_read+0xa60/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207440] zfs_blkptr_verify+0xfc/0x3a0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207442] ? _cond_resched+0x19/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207497] zio_read+0x34/0xa0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207537] ? arc_read+0xa60/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207577] arc_read+0x5ff/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207617] ? arc_buf_destroy+0x140/0x140 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207668] dsl_scan_prefetch.isra.8+0xb7/0xd0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207718] dsl_scan_visitbp+0x3c6/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207769] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207820] dsl_scan_visitbp+0x7c5/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207871] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207922] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207973] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208024] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208075] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208127] dsl_scan_visitbp+0x97b/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208178] dsl_scan_visitds+0x10c/0x4e0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208231] dsl_scan_sync+0x2ef/0xb90 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208288] ? zio_destroy+0xbc/0xc0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208342] ? zio_wait+0x147/0x1b0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208397] spa_sync+0x49e/0xd30 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208455] txg_sync_thread+0x2cd/0x4a0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208458] ? __switch_to_asm+0x35/0x70
May 11 13:12:38 pve-resc kernel: [ 5314.208514] ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208521] thread_generic_wrapper+0x74/0x90 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.208525] kthread+0x121/0x140
May 11 13:12:38 pve-resc kernel: [ 5314.208532] ? __thread_exit+0x20/0x20 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.208534] ? kthread_park+0xb0/0xb0
May 11 13:12:38 pve-resc kernel: [ 5314.208537] ret_from_fork+0x22/0x40
No other behavior:
zpool import -f -F -m -N -o cachefile=none rpool
This was tested on different hardware under different kernel versions – last under the current PVE-6.2-1
The following message is conspicuous:
PANIC: blkptr at 000000004ab2be1f has invalid TYPE 140
I have found absolutely no clues on the net in connection with ZFS.
Have we come across a new bug in the ZFS universe here?
Who has a helpful idea to get the server up and running again?
