PANIC: * invalid TYPE 140 (My confidence in ZFS on Linux for business use is is strongly strained)

May 13, 2020
1
0
1
59
Hi all,

what happened?

A PVE server commissioned in 2018 ceased service after almost 2 years of trouble-free continuous operation; server access was no longer possible, neither PVE-GUI nor other services; ping was still possible.

A login to the physical console was still successful. The subsequent shutdown got stuck on shutdown of the VMs…

2020-05-10_0916_600px.jpg

Now only a reset remained as an option.

The subsequent boot process will hang when the root system is importing rpool.

2020-05-11_1203_600px.jpg

Shortly after that a lot of kernel messages appear every two minutes:

2020-05-11_1204_600px.jpg

After an initial hardware check, a hardware problem can be ruled out.

rpool is on a mirror of 2x SSD Samsung 850 Pro 1TB, another pool on a raidz2 with 4x 4TB HDD, no ZIL, no L2ARC.

CPU: AMD FX-8350
RAM: 32GB DDR3 ECC
Proxmox, probably PVE 5.2
Kernel: 4.15.18-21-pve

The SSDs are okay according to smartctl.

The big problem: rpool can no longer be imported in any way that I know of, even though a zpool import shows an intact pool:

Code:
# zpool import
   pool: rpool
     id: 9373167444002024865
  state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
    the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
config:

    rpool       ONLINE
      mirror-0  ONLINE
        sda2    ONLINE
        sdc2    ONLINE


A zpool import -f rpool does not end with a normal error message, but instead triggers a kernel panic:

May 11 13:12:38 pve-resc kernel: [ 5314.207156] PANIC: blkptr at 000000004ab2be1f has invalid TYPE 140
May 11 13:12:38 pve-resc kernel: [ 5314.207161] Showing stack for process 24099
May 11 13:12:38 pve-resc kernel: [ 5314.207168] CPU: 2 PID: 24099 Comm: txg_sync Tainted: P O 5.0.0-32-generic #34~18.04.2-Ubuntu
May 11 13:12:38 pve-resc kernel: [ 5314.207169] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
May 11 13:12:38 pve-resc kernel: [ 5314.207171] Call Trace:
May 11 13:12:38 pve-resc kernel: [ 5314.207180] dump_stack+0x63/0x85
May 11 13:12:38 pve-resc kernel: [ 5314.207194] spl_dumpstack+0x42/0x50 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.207202] vcmn_err+0xc3/0x100 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.207208] ? _cond_resched+0x19/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207212] ? __kmalloc+0x62/0x210
May 11 13:12:38 pve-resc kernel: [ 5314.207215] ? sg_kmalloc+0x19/0x30
May 11 13:12:38 pve-resc kernel: [ 5314.207217] ? sg_init_table+0x15/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207219] ? __sg_alloc_table+0x9b/0x160
May 11 13:12:38 pve-resc kernel: [ 5314.207220] ? sg_zero_buffer+0xc0/0xc0
May 11 13:12:38 pve-resc kernel: [ 5314.207307] zfs_panic_recover+0x69/0x90 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207346] ? abd_alloc+0x2cd/0x480 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207386] ? arc_read+0xa60/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207440] zfs_blkptr_verify+0xfc/0x3a0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207442] ? _cond_resched+0x19/0x40
May 11 13:12:38 pve-resc kernel: [ 5314.207497] zio_read+0x34/0xa0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207537] ? arc_read+0xa60/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207577] arc_read+0x5ff/0xa60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207617] ? arc_buf_destroy+0x140/0x140 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207668] dsl_scan_prefetch.isra.8+0xb7/0xd0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207718] dsl_scan_visitbp+0x3c6/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207769] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207820] dsl_scan_visitbp+0x7c5/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207871] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207922] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.207973] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208024] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208075] dsl_scan_visitbp+0x487/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208127] dsl_scan_visitbp+0x97b/0xd60 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208178] dsl_scan_visitds+0x10c/0x4e0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208231] dsl_scan_sync+0x2ef/0xb90 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208288] ? zio_destroy+0xbc/0xc0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208342] ? zio_wait+0x147/0x1b0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208397] spa_sync+0x49e/0xd30 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208455] txg_sync_thread+0x2cd/0x4a0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208458] ? __switch_to_asm+0x35/0x70
May 11 13:12:38 pve-resc kernel: [ 5314.208514] ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
May 11 13:12:38 pve-resc kernel: [ 5314.208521] thread_generic_wrapper+0x74/0x90 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.208525] kthread+0x121/0x140
May 11 13:12:38 pve-resc kernel: [ 5314.208532] ? __thread_exit+0x20/0x20 [spl]
May 11 13:12:38 pve-resc kernel: [ 5314.208534] ? kthread_park+0xb0/0xb0
May 11 13:12:38 pve-resc kernel: [ 5314.208537] ret_from_fork+0x22/0x40


No other behavior:
zpool import -f -F -m -N -o cachefile=none rpool

This was tested on different hardware under different kernel versions – last under the current PVE-6.2-1

The following message is conspicuous:

PANIC: blkptr at 000000004ab2be1f has invalid TYPE 140

I have found absolutely no clues on the net in connection with ZFS.

Have we come across a new bug in the ZFS universe here?

Who has a helpful idea to get the server up and running again?

:eek:
 
> CPU: 2 PID: 24099 Comm: txg_sync Tainted: P O 5.0.0-32-generic #34~18.04.2-Ubuntu

Ubuntu Kernel?

Try to import with the Proxmox VE ISO (Debug mode), I am not sure if the Ubuntu 18.04 is working.

See also https://pve.proxmox.com/wiki/Debugging_Installation

Besides that, you missed a lot of important updates recently and you run a quite old desktop hardware with cheap consumer SSDs - I just mention this because you talk about "business use".
 
Your cpu doesnt support buffered ecc ram fyi.

That definitely looks like a hardware problem.

Did you try booting just with a single disk ? Try both out.

Make sure cables are ok.

Updating proxmox could also help, there is a much newer zfs version available.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!