ZFS related kernel panics on Proxmox 7.0-13

Nov 5, 2021
7
4
8
44
Hi all,

I have been troubleshooting issues with kernel panics on Proxmox 7.0-13 for the past week but I was unsure of the cause due to lots of configuring and setting up prior to noticing the issues. I decided to reinstall Proxmox and start from scratch today and was able to prove that everything seems to work fine until I involve this mirrored ZFS pool. I thought my issues were with VM configuration, but I was just able to cause a kernel panic without a VM running at all on a nearly fresh install.

My specs:
  • Dell Optiplex 5060
  • Intel i7-8700 & 64 GB (4x16GB) Crucial DDR4 2666 RAM
  • SanDisk X600 M.2 2280 SATA 128GB (Proxmox ext4 disk)
  • 2x Intel S3700 (the HP branded version, device model# MK0400GCTZA) (ZFS pool disks for VM disk storage)
  • Intel E1G44ET NIC
Like I said, I reinstalled proxmox fresh to the M.2 drive and only did very basic setup:
  • Added the no subscription repository
  • Fully updated the system & rebooted
  • Setup my networking bonds & bridges
  • Wiped the 2 S3700's, created a mirrored ZFS pool with lz4 compression and ashift=12, and confirmed the pool shows up as healthy
  • Created a single debian 11 server VM using the local-lvm for storage rather than the ZFS pool and using mostly all default settings, installed the OS and rebooted the VM, confirmed it was fully working and able to mount a nfs share from my NAS and transfer data to & from it.
  • Within the debian VM, I unmounted the NFS share and issued the command "shutdown now" as root
At this point I tried to move the VM disk from the local-lvm to the ZFS pool to start testing with it further, but this time the transfer of the VM from local-lvm to the ZFS pool caused a kernel panic. I went to the hardware screen for the VM, selected the hard disk, pressed the move disk button, and selected to move it to the ZFS pool. The transfer began but the UI eventually froze after transferring a few GB. At this point I checked the console and sure enough saw the kernel panic. I went to the syslog and got this:

Code:
Nov 04 23:44:18 hyper pvedaemon[1392]: <root@pam> move disk VM 900: move --disk virtio0 --storage tank1
Nov 04 23:44:18 hyper pvedaemon[1392]: <root@pam> starting task UPID:hyper:00001C85:00031B2D:6184C4B2:qmmove:900:root@pam:
Nov 04 23:44:44 hyper kernel: BUG: kernel NULL pointer dereference, address: 0000000000000088
Nov 04 23:44:44 hyper kernel: #PF: supervisor read access in kernel mode
Nov 04 23:44:44 hyper kernel: #PF: error_code(0x0000) - not-present page
Nov 04 23:44:44 hyper kernel: PGD 0 P4D 0
Nov 04 23:44:44 hyper kernel: Oops: 0000 [#1] SMP PTI
Nov 04 23:44:44 hyper kernel: CPU: 0 PID: 7363 Comm: qemu-img Tainted: P           O      5.11.22-5-pve #1
Nov 04 23:44:44 hyper kernel: Hardware name: Dell Inc. OptiPlex 5060/0654JC, BIOS 1.14.0 07/22/2021
Nov 04 23:44:44 hyper kernel: RIP: 0010:workingset_activation+0x4d/0xa0
Nov 04 23:44:44 hyper kernel: Code: 14 c5 e0 0b 75 92 0f 1f 44 00 00 48 8b 47 38 48 83 e0 fc 48 0f 44 05 9a ed cc 01 48 63 8a 00 9d 02 00 4c 8b 84 c8 00 0b 00 00 <49> 3b 90 88 00 00 00 75 35 48 8b 07 4c 89 c7 48 c1 e8 10 83 e0 01
Nov 04 23:44:44 hyper kernel: RSP: 0018:ffffa246c3643c88 EFLAGS: 00010282
Nov 04 23:44:44 hyper kernel: RAX: ffff8924a19cf068 RBX: ffffc558595232c0 RCX: 0000000000000000
Nov 04 23:44:44 hyper kernel: RDX: ffff8933c07d6000 RSI: 0000000500000000 RDI: ffffc558595232c0
Nov 04 23:44:44 hyper kernel: RBP: ffffa246c3643c88 R08: 0000000000000000 R09: ffff8924dbd6c880
Nov 04 23:44:44 hyper kernel: R10: 0000000000000001 R11: ffff8924dbd6c9f8 R12: ffff893380228f40
Nov 04 23:44:44 hyper kernel: R13: ffff892606d04608 R14: ffffa246c3643e60 R15: ffffa246c3643e38
Nov 04 23:44:44 hyper kernel: FS:  00007f6264ae6700(0000) GS:ffff893380200000(0000) knlGS:0000000000000000
Nov 04 23:44:44 hyper kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 23:44:44 hyper kernel: CR2: 0000000000000088 CR3: 000000010c776004 CR4: 00000000003706f0
Nov 04 23:44:44 hyper kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 23:44:44 hyper kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 23:44:44 hyper kernel: Call Trace:
Nov 04 23:44:44 hyper kernel:  mark_page_accessed+0x181/0x1f0
Nov 04 23:44:44 hyper kernel:  generic_file_buffered_read+0x230/0x4a0
Nov 04 23:44:44 hyper kernel:  generic_file_read_iter+0xdf/0x140
Nov 04 23:44:44 hyper kernel:  blkdev_read_iter+0x4a/0x60
Nov 04 23:44:44 hyper kernel:  new_sync_read+0x10d/0x190
Nov 04 23:44:44 hyper kernel:  vfs_read+0x15a/0x1c0
Nov 04 23:44:44 hyper kernel:  __x64_sys_pread64+0x93/0xc0
Nov 04 23:44:44 hyper kernel:  do_syscall_64+0x38/0x90
Nov 04 23:44:44 hyper kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 04 23:44:44 hyper kernel: RIP: 0033:0x7f627777c917
Nov 04 23:44:44 hyper kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 05 f4 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 35 f4 ff ff 48 8b
Nov 04 23:44:44 hyper kernel: RSP: 002b:00007f6264ae1680 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
Nov 04 23:44:44 hyper kernel: RAX: ffffffffffffffda RBX: 00007f6266aec000 RCX: 00007f627777c917
Nov 04 23:44:44 hyper kernel: RDX: 0000000000200000 RSI: 00007f6266aec000 RDI: 0000000000000004
Nov 04 23:44:44 hyper kernel: RBP: 00007f6266ded840 R08: 0000000000000000 R09: 00000000ffffffff
Nov 04 23:44:44 hyper kernel: R10: 0000000353dff400 R11: 0000000000000293 R12: 0000000000000000
Nov 04 23:44:44 hyper kernel: R13: 00005618e9a364c8 R14: 00005618e9a4ed90 R15: 0000000000802000
Nov 04 23:44:44 hyper kernel: Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd cryptd glue_helper cec rc_core dell_wmi fb_sys_fops rapl syscopyarea dell_smbios mei_hdcp sysfillrect dcdbas intel_cstate sysimgblt mei_me pcspkr intel_pch_thermal dell_wmi_sysman efi_pstore ee1004 mei dell_wmi_descriptor dell_wmi_aio wmi_bmof sparse_keymap mac_hid acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Nov 04 23:44:44 hyper kernel:  libcrc32c crc32_pclmul intel_lpss_pci i2c_i801 intel_lpss xhci_pci igb xhci_pci_renesas ahci idma64 i2c_algo_bit e1000e i2c_smbus dca libahci xhci_hcd virt_dma wmi video pinctrl_cannonlake
Nov 04 23:44:44 hyper kernel: CR2: 0000000000000088
Nov 04 23:44:44 hyper kernel: ---[ end trace 7ffa6e57016e7357 ]---
Nov 04 23:44:44 hyper kernel: RIP: 0010:workingset_activation+0x4d/0xa0
Nov 04 23:44:44 hyper kernel: Code: 14 c5 e0 0b 75 92 0f 1f 44 00 00 48 8b 47 38 48 83 e0 fc 48 0f 44 05 9a ed cc 01 48 63 8a 00 9d 02 00 4c 8b 84 c8 00 0b 00 00 <49> 3b 90 88 00 00 00 75 35 48 8b 07 4c 89 c7 48 c1 e8 10 83 e0 01
Nov 04 23:44:44 hyper kernel: RSP: 0018:ffffa246c3643c88 EFLAGS: 00010282
Nov 04 23:44:44 hyper kernel: RAX: ffff8924a19cf068 RBX: ffffc558595232c0 RCX: 0000000000000000
Nov 04 23:44:44 hyper kernel: RDX: ffff8933c07d6000 RSI: 0000000500000000 RDI: ffffc558595232c0
Nov 04 23:44:44 hyper kernel: RBP: ffffa246c3643c88 R08: 0000000000000000 R09: ffff8924dbd6c880
Nov 04 23:44:44 hyper kernel: R10: 0000000000000001 R11: ffff8924dbd6c9f8 R12: ffff893380228f40
Nov 04 23:44:44 hyper kernel: R13: ffff892606d04608 R14: ffffa246c3643e60 R15: ffffa246c3643e38
Nov 04 23:44:44 hyper kernel: FS:  00007f6264ae6700(0000) GS:ffff893380200000(0000) knlGS:0000000000000000
Nov 04 23:44:44 hyper kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 23:44:44 hyper kernel: CR2: 0000000000000088 CR3: 000000010c776004 CR4: 00000000003706f0
Nov 04 23:44:44 hyper kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 23:44:44 hyper kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 23:44:46 hyper kernel: general protection fault, probably for non-canonical address 0xefa33951ffff892a: 0000 [#2] SMP PTI
Nov 04 23:44:46 hyper kernel: CPU: 3 PID: 513 Comm: dbuf_evict Tainted: P      D    O      5.11.22-5-pve #1
Nov 04 23:44:46 hyper kernel: Hardware name: Dell Inc. OptiPlex 5060/0654JC, BIOS 1.14.0 07/22/2021
Nov 04 23:44:46 hyper kernel: RIP: 0010:arc_buf_destroy+0x1c/0x110 [zfs]
Nov 04 23:44:46 hyper kernel: Code: 07 00 0f 1f 40 00 e9 f4 fe ff ff 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 4c 8b 27 48 8b 05 ec f9 28 00 <49> 39 84 24 f8 00 00 00 0f 84 95 00 00 00 49 8b 4c 24 10 49 8b 54
Nov 04 23:44:46 hyper kernel: RSP: 0018:ffffa246c1727e08 EFLAGS: 00010282
Nov 04 23:44:46 hyper kernel: RAX: ffffffffc0c8abe0 RBX: 28f5c28f5c28f5c3 RCX: 0000000000000000
Nov 04 23:44:46 hyper kernel: RDX: ffffffffffffe000 RSI: ffff892a344aa180 RDI: ffff892ac30ff204
Nov 04 23:44:46 hyper kernel: RBP: ffffa246c1727e30 R08: 000000000000000b R09: 788af778fc6207b2
Nov 04 23:44:46 hyper kernel: R10: 0000000000000000 R11: ffffa246c1727e60 R12: efa33951ffff892a
Nov 04 23:44:46 hyper kernel: R13: ffff892ac4e3be00 R14: ffff8924918b6300 R15: ffffffffc0d6b0a0
Nov 04 23:44:46 hyper kernel: FS:  0000000000000000(0000) GS:ffff8933802c0000(0000) knlGS:0000000000000000
Nov 04 23:44:46 hyper kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 23:44:46 hyper kernel: CR2: 0000561dff3dd398 CR3: 0000000d3cc10005 CR4: 00000000003706e0
Nov 04 23:44:46 hyper kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 23:44:46 hyper kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 23:44:46 hyper kernel: Call Trace:
Nov 04 23:44:46 hyper kernel: Call Trace:
Nov 04 23:44:46 hyper kernel:  dbuf_destroy+0x31/0x460 [zfs]
Nov 04 23:44:46 hyper kernel:  ? _cond_resched+0x1a/0x50
Nov 04 23:44:46 hyper kernel:  dbuf_evict_one+0x10a/0x140 [zfs]
Nov 04 23:44:46 hyper kernel:  dbuf_evict_thread+0x12d/0x1e0 [zfs]
Nov 04 23:44:46 hyper kernel:  ? dbuf_evict_one+0x140/0x140 [zfs]
Nov 04 23:44:46 hyper kernel:  thread_generic_wrapper+0x79/0x90 [spl]
Nov 04 23:44:46 hyper kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 04 23:44:46 hyper kernel:  kthread+0x12b/0x150
Nov 04 23:44:46 hyper kernel:  ? set_kthread_struct+0x50/0x50
Nov 04 23:44:46 hyper kernel:  ret_from_fork+0x22/0x30
Nov 04 23:44:46 hyper kernel: Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd cryptd glue_helper cec rc_core dell_wmi fb_sys_fops rapl syscopyarea dell_smbios mei_hdcp sysfillrect dcdbas intel_cstate sysimgblt mei_me pcspkr intel_pch_thermal dell_wmi_sysman efi_pstore ee1004 mei dell_wmi_descriptor dell_wmi_aio wmi_bmof sparse_keymap mac_hid acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Nov 04 23:44:46 hyper kernel:  libcrc32c crc32_pclmul intel_lpss_pci i2c_i801 intel_lpss xhci_pci igb xhci_pci_renesas ahci idma64 i2c_algo_bit e1000e i2c_smbus dca libahci xhci_hcd virt_dma wmi video pinctrl_cannonlake
Nov 04 23:44:46 hyper kernel: ---[ end trace 7ffa6e57016e7358 ]---
Nov 04 23:44:47 hyper kernel: RIP: 0010:workingset_activation+0x4d/0xa0
Nov 04 23:44:47 hyper kernel: Code: 14 c5 e0 0b 75 92 0f 1f 44 00 00 48 8b 47 38 48 83 e0 fc 48 0f 44 05 9a ed cc 01 48 63 8a 00 9d 02 00 4c 8b 84 c8 00 0b 00 00 <49> 3b 90 88 00 00 00 75 35 48 8b 07 4c 89 c7 48 c1 e8 10 83 e0 01
Nov 04 23:44:47 hyper kernel: RSP: 0018:ffffa246c3643c88 EFLAGS: 00010282
Nov 04 23:44:47 hyper kernel: RAX: ffff8924a19cf068 RBX: ffffc558595232c0 RCX: 0000000000000000
Nov 04 23:44:47 hyper kernel: RDX: ffff8933c07d6000 RSI: 0000000500000000 RDI: ffffc558595232c0
Nov 04 23:44:47 hyper kernel: RBP: ffffa246c3643c88 R08: 0000000000000000 R09: ffff8924dbd6c880
Nov 04 23:44:47 hyper kernel: R10: 0000000000000001 R11: ffff8924dbd6c9f8 R12: ffff893380228f40
Nov 04 23:44:47 hyper kernel: R13: ffff892606d04608 R14: ffffa246c3643e60 R15: ffffa246c3643e38
Nov 04 23:44:47 hyper kernel: FS:  0000000000000000(0000) GS:ffff8933802c0000(0000) knlGS:0000000000000000
Nov 04 23:44:47 hyper kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 23:44:47 hyper kernel: CR2: 0000561dff3dd398 CR3: 0000000121970004 CR4: 00000000003706e0
Nov 04 23:44:47 hyper kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 23:44:47 hyper kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 23:45:00 hyper systemd[1]: Starting Proxmox VE replication runner...
Nov 04 23:45:01 hyper systemd[1]: pvesr.service: Succeeded.
Nov 04 23:45:01 hyper systemd[1]: Finished Proxmox VE replication runner.

Troubleshooting steps I have taken so far:
  • Prior to installing proxmox on this system with this exact hardware configuration, I did the following:
    • Fully updated the BIOS
    • Ran 24 hours of memtest, which it survived with no errors.
    • I installed both Ubuntu Server 20.04 and FreeBSD13 on this machine to test it out. In both of those configurations, I formatted these S3700's with ZFS, filled them with data or used them as root disks, and used them for testing with no issues that I witnessed.
    • All 3 SSD's were data tested and had long SMART tests run, which they survived with no errors
  • After getting the kernel panic above, I did the following:
    • Ran long smart tests on the drives again, which they passed again.
    • Checked the health of the zpool, which shows as online with no known data errors from "zpool status" and healthy in the Proxmox UI.
At this point it really seems like there is some sort of issue between Proxmox and ZFS specifically. All of this hardware was tested with Linux and ZFS particularly without a single error popping up until Proxmox came into the equation. I am running out of troubleshooting steps I know to take. Can anyone recommend anything? I can provide any information requested.

Thank you and I appreciate your time if you read all this!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!