PANIC at zio.c:314:zio_data_buf_alloc()

jimi · Jun 13, 2021

Hello, I'm using Proxmox for the first time and have installed a few weeks ago version 6.4.

So far it's been completely stable but in the last few days I've had two instances of kernel panic in zio.c which hangs the only VM in use and renders the system unstable.

Here's the dmesg info;


[76382.347243] VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
[76382.349657] PANIC at zio.c:314:zio_data_buf_alloc()
[76382.350850] Showing stack for process 19185
[76382.352006] CPU: 3 PID: 19185 Comm: kvm Tainted: P           O      5.4.106-1-pve #1
[76382.353129] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F33j 04/23/2021
[76382.354248] Call Trace:
[76382.354254]  dump_stack+0x6d/0x8b
[76382.354259]  spl_dumpstack+0x29/0x2b [spl]
[76382.354261]  spl_panic+0xd3/0xfb [spl]
[76382.354263]  ? ___slab_alloc+0x2ae/0x580
[76382.354265]  ? _cond_resched+0x19/0x30
[76382.354265]  ? kmem_cache_alloc+0x17e/0x240
[76382.354267]  ? spl_kmem_cache_alloc+0x7c/0x770 [spl]
[76382.354268]  ? spl_kmem_cache_alloc+0x14d/0x770 [spl]
[76382.354268]  ? _cond_resched+0x19/0x30
[76382.354269]  ? _cond_resched+0x19/0x30
[76382.354269]  ? mutex_lock+0x12/0x30
[76382.354297]  zio_data_buf_alloc+0x58/0x60 [zfs]
[76382.354307]  abd_alloc_linear+0x88/0xc0 [zfs]
[76382.354318]  abd_alloc+0x8e/0xd0 [zfs]
[76382.354329]  arc_get_data_abd.isra.44+0x45/0x70 [zfs]
[76382.354341]  arc_hdr_alloc_abd+0x5d/0xb0 [zfs]
[76382.354352]  arc_hdr_alloc+0xec/0x160 [zfs]
[76382.354363]  arc_alloc_buf+0x4c/0xd0 [zfs]
[76382.354375]  dbuf_alloc_arcbuf_from_arcbuf+0xf6/0x180 [zfs]
[76382.354376]  ? _cond_resched+0x19/0x30
[76382.354376]  ? _cond_resched+0x19/0x30
[76382.354388]  dbuf_hold_copy.isra.24+0x36/0xb0 [zfs]
[76382.354404]  dbuf_hold_impl+0x43b/0x600 [zfs]
[76382.354416]  dbuf_hold+0x33/0x60 [zfs]
[76382.354428]  dmu_buf_hold_noread+0x8a/0x110 [zfs]
[76382.354440]  dmu_buf_hold+0x3c/0x90 [zfs]
[76382.354469]  zfs_get_data+0x197/0x340 [zfs]
[76382.354488]  zil_commit_impl+0x9d6/0xdb0 [zfs]
[76382.354510]  zil_commit+0x3d/0x60 [zfs]
[76382.354528]  zfs_fsync+0x77/0x100 [zfs]
[76382.354544]  zpl_fsync+0x6c/0xa0 [zfs]
[76382.354547]  vfs_fsync_range+0x48/0x80
[76382.354548]  ? __fget_light+0x59/0x70
[76382.354548]  do_fsync+0x3d/0x70
[76382.354549]  __x64_sys_fdatasync+0x17/0x20
[76382.354551]  do_syscall_64+0x57/0x190
[76382.354552]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[76382.354553] RIP: 0033:0x7ffa667ce2e7
[76382.354554] Code: b8 4b 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3c c3 0f 1f 00 53 89 fb 48 83 ec 10 e8 74 54 01 00 89 df 89 c2 b8 4b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 b6 54 01 00 8b 44 24
[76382.354554] RSP: 002b:00007ff06dff7cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[76382.354555] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007ffa667ce2e7
[76382.354555] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
[76382.354556] RBP: 000055a5a2a52b92 R08: 0000000000000000 R09: 00000000ffffffff
[76382.354556] R10: 00007ff06dff7ce0 R11: 0000000000000293 R12: 000055a5a2dcd2e8
[76382.354556] R13: 000055a5a3bb4b58 R14: 000055a5a3bb4ae0 R15: 000055a5a3bdd100
[76608.205037] INFO: task z_wr_int:1620 blocked for more than 120 seconds.
[76608.206932]       Tainted: P           O      5.4.106-1-pve #1
[76608.208839] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[76608.210729] z_wr_int        D    0  1620      2 0x80004000
[76608.212583] Call Trace:
[76608.214411]  __schedule+0x2e6/0x700
[76608.216207]  ? mutex_lock+0x12/0x30
[76608.217976]  schedule+0x33/0xa0
[76608.219720]  schedule_preempt_disabled+0xe/0x10
[76608.221408]  __mutex_lock.isra.10+0x2c9/0x4c0
[76608.223032]  __mutex_lock_slowpath+0x13/0x20
[76608.224639]  mutex_lock+0x2c/0x30
[76608.226239]  dbuf_write_done+0x43/0x220 [zfs]
[76608.227779]  arc_write_done+0x8f/0x410 [zfs]
[76608.229274]  zio_done+0x43f/0x1020 [zfs]
[76608.230750]  zio_execute+0x99/0xf0 [zfs]
[76608.232188]  taskq_thread+0x2f7/0x4e0 [spl]
[76608.233618]  ? wake_up_q+0x80/0x80
[76608.235049]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[76608.236465]  kthread+0x120/0x140
[76608.237875]  ? task_done+0xb0/0xb0 [spl]
[76608.239285]  ? kthread_park+0x90/0x90
[76608.240694]  ret_from_fork+0x22/0x40
[76608.242104] INFO: task txg_sync:1787 blocked for more than 120 seconds.
[76608.243527]       Tainted: P           O      5.4.106-1-pve #1
[76608.244950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[76608.246379] txg_sync        D    0  1787      2 0x80004000
[76608.247788] Call Trace:
[76608.249188]  __schedule+0x2e6/0x700
[76608.250580]  schedule+0x33/0xa0
[76608.251955]  schedule_timeout+0x152/0x330
[76608.253329]  ? __next_timer_interrupt+0xd0/0xd0
[76608.254702]  io_schedule_timeout+0x1e/0x50
[76608.256068]  __cv_timedwait_common+0x138/0x170 [spl]
[76608.257425]  ? wait_woken+0x80/0x80
[76608.258758]  __cv_timedwait_io+0x19/0x20 [spl]
[76608.260110]  zio_wait+0x139/0x280 [zfs]
[76608.261429]  ? _cond_resched+0x19/0x30
[76608.262742]  dsl_pool_sync+0xdc/0x510 [zfs]
[76608.264043]  spa_sync+0x5a4/0xfe0 [zfs]
[76608.265310]  ? mutex_lock+0x12/0x30
[76608.266583]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[76608.267858]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[76608.269133]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[76608.270390]  thread_generic_wrapper+0x74/0x90 [spl]
[76608.271651]  kthread+0x120/0x140
[76608.272905]  ? __thread_exit+0x20/0x20 [spl]
[76608.274157]  ? kthread_park+0x90/0x90
[76608.275400]  ret_from_fork+0x22/0x40
[76608.276677] INFO: task kvm:19167 blocked for more than 120 seconds.
[76608.277924]       Tainted: P           O      5.4.106-1-pve #1
[76608.279163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[76608.280411] kvm             D    0 19167      1 0x00004000
[76608.281657] Call Trace:
[76608.282881]  __schedule+0x2e6/0x700
[76608.284106]  schedule+0x33/0xa0
[76608.285318]  cv_wait_common+0x104/0x130 [spl]
[76608.286529]  ? wait_woken+0x80/0x80
[76608.287719]  __cv_wait+0x15/0x20 [spl]
[76608.288906]  zfs_rangelock_enter_impl+0x16a/0x5c0 [zfs]
[76608.290088]  zfs_rangelock_enter+0x11/0x20 [zfs]
[76608.291239]  zfs_extend+0x44/0x220 [zfs]
[76608.292375]  ? sa_lookup+0x71/0x90 [zfs]
[76608.293486]  zfs_freesp+0x21d/0x480 [zfs]
[76608.294548]  ? _cond_resched+0x19/0x30
[76608.295596]  ? mutex_lock+0x12/0x30
[76608.296642]  ? rrw_exit+0x6a/0x160 [zfs]
[76608.297647]  ? rrm_exit+0x46/0x80 [zfs]
[76608.298610]  ? zfs_statvfs+0x191/0x4e0 [zfs]
[76608.299558]  ? rrw_exit+0x6a/0x160 [zfs]
[76608.300491]  ? zfs_space+0xd3/0x210 [zfs]
[76608.301425]  zpl_fallocate_common+0x255/0x290 [zfs]
[76608.302351]  ? common_file_perm+0x5e/0x140
[76608.303292]  zpl_fallocate+0x12/0x20 [zfs]
[76608.304220]  vfs_fallocate+0x147/0x280
[76608.305145]  ksys_fallocate+0x41/0x80
[76608.306063]  __x64_sys_fallocate+0x1e/0x30
[76608.306986]  do_syscall_64+0x57/0x190
[76608.307906]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[76608.308835] RIP: 0033:0x7ffa667cc46d
[76608.309767] Code: Bad RIP value.
[76608.310691] RSP: 002b:00007ff08d7f6c70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[76608.311649] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007ffa667cc46d
[76608.312615] RDX: 00000b1c05fd0000 RSI: 0000000000000000 RDI: 0000000000000018
[76608.313585] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[76608.314541] R10: 00000000000a0000 R11: 0000000000000293 R12: 00000b1c05fd0000
[76608.315486] R13: 00000000000a0000 R14: 000055a5a3bb4ae0 R15: 000055a5a51f18c0

The messsage above repeats. The VM is still running and somewhat responsive but unable to be stopped. PVE shutdown takes a half hour to stop with a lot of "Unmounting <pool>" and "Failed unmounting <pool>" msgs before "[ OK ] Reached target Unmount All Filesystems" and then a bunch of "systemd-shutdown[1]: Sending SIGKILL to PID xxxx" for the following:

sd-sync
sync
umount

Then it tries "Remounting '/mnt/POOLNAME' read-only in with options 'xattr,noacl'" for multiple pools with next message being:

"Failed to remount '/mnt/POOLNAME' read-only: Device or resource busy"

Eventually it gives up with 5 filesytems remaining that cannot be unmounted, "Syncing filesystems and block devices" does a time-out, SIGKILL is issued and shutdown completes.

The second instance just happened, after the first time the system rebooted Ok, scrub showed no problems on the pools and the VM restarted eventually.

I'm using Micron ECC RAM from the Gigabyte QLV list for the board and memtest'd quite a bit before installing Proxmox.

I'm using Ubuntu 21.04 for the VM with ZFS as it's filesystem too. (The VM storage is a variety of NVME, SSD and HD's (scsi qcow2 with write-back cache enabled and discard=on for the NVME/SSD devices.)

The only instances of the specific panic error message I'm seeing on google have to do with importing pools, so not too helpful in my case. (Or at least I don't think so.)

Any other details needed let me know. Thank you.

jimi · Jun 22, 2021

Just to followup; I think I found the problem, the system has been stable under heavy load for about 8 days since I made the last config change so it seems the issue is resolved.

Basically, when I setup ZFS on the VM I had it mis-configured and had two mount-points using the same pool.

Code:

524  zpool create -o ashift=12 nvme0pool scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
525  zpool create -o ashift=12 nvme1pool scsi-0QEMU_QEMU_HARDDISK_drive-scsi4
526  zfs create -o mountpoint=/mnt/NVME1POOL nvme1pool/NVME1POOL
527  zfs create -o mountpoint=/mnt/NVME0POOL nvme1pool/NVME0POOL

(See mistake on line 527 above)

And then I had applications writing to the two mount points thinking they had 1TB of space per... They likely filled up the pool, I'm guessing this caused the crashes. Or maybe it was just the mis-configuration itself, without the pool being over-filled.

I'm surprised this could render the host itself unstable but live and learn.

I guess I should report this on the ZFS github or?

Search

Search

PANIC at zio.c:314:zio_data_buf_alloc()

jimi

Member

jimi

Member

We value your privacy