Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription

I just did the 8 to 9 upgrade on 6 servers and 1 of them refused to boot with the 6.17 kernel. All 6 servers have the root fs on zfs using a mirror. However with this 1 bad server, when it tries to mount rpool, it dies and just hangs. The 6.8 and 6.14 kernels boot on the same server without any issues.

I was able to boot into the initramfs shell and manually mount rpool read-only, but manually mounting it read-write output the following to dmesg:

Code:
[  396.156210] spl: loading out-of-tree module taints kernel.
[  396.177606] zfs: module license 'CDDL' taints kernel.
[  396.177609] Disabling lock debugging due to kernel taint
[  396.177620] zfs: module license taints kernel.
[  396.557045] ZFS: Loaded module v2.3.4-pve1, ZFS pool version 5000, ZFS filesystem version 5
[  397.299048] Oops: general protection fault, probably for non-canonical address 0x6972775f73746e65: 0000 [#1] SMP NOPTI
[  397.299188] CPU: 14 UID: 0 PID: 761 Comm: txg_sync Tainted: P S         O        6.17.4-2-pve #1 PREEMPT(voluntary)
[  397.299258] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
[  397.299327] Hardware name: Gigabyte Technology Co., Ltd. Z790 S WIFI DDR4/Z790 S WIFI DDR4, BIOS F1 02/06/2024
[  397.299400] RIP: 0010:dmu_objset_userquota_get_ids+0x1a0/0x500 [zfs]
[  397.299645] Code: 45 b8 00 00 00 00 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 49 8b 44 24 18 48 8b 80 c0 02 00 00 48 8b 04 c5 00 f2 e4 c0 <ff> d0 0f 1f 00 41 89 c4 48 8b 45 b0 49 89 87 20 03 00 00 48 8b 45
[  397.299810] RSP: 0018:ffffd17bcc253b60 EFLAGS: 00010246
[  397.299897] RAX: 6972775f73746e65 RBX: 0000000000000001 RCX: 0000000000000000
[  397.299986] RDX: ffffd17bcc253b78 RSI: ffff8bba70ef9ec0 RDI: 0000000000000010
[  397.300077] RBP: ffffd17bcc253bc8 R08: 0000000000000000 R09: 0000000000000000
[  397.300169] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bba708f9800
[  397.300263] R13: 0000000000000000 R14: ffff8bba43078c00 R15: ffff8bba7101fab8
[  397.300359] FS:  0000000000000000(0000) GS:ffff8bd1f1c86000(0000) knlGS:0000000000000000
[  397.300458] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  397.300562] CR2: 0000752ff5797ff8 CR3: 000000099b83a004 CR4: 0000000000f70ef0
[  397.300666] PKRU: 55555554
[  397.300770] Call Trace:
[  397.300876]  <TASK>
[  397.300981]  dnode_setdirty+0x34/0x110 [zfs]
[  397.301235]  dbuf_dirty+0x824/0x950 [zfs]
[  397.301474]  ? zio_create+0x406/0x690 [zfs]
[  397.301727]  dmu_buf_will_dirty_impl+0xa0/0x270 [zfs]
[  397.301975]  dmu_buf_will_dirty+0x16/0x30 [zfs]
[  397.302218]  dsl_dataset_sync+0x26/0x200 [zfs]
[  397.302484]  dsl_pool_sync+0xa9/0x4e0 [zfs]
[  397.302755]  spa_sync+0x561/0x1070 [zfs]
[  397.303029]  ? spa_txg_history_init_io+0x11c/0x130 [zfs]
[  397.303296]  txg_sync_thread+0x209/0x3b0 [zfs]
[  397.303559]  ? try_to_wake_up+0x392/0x8a0
[  397.303696]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[  397.303961]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[  397.304106]  thread_generic_wrapper+0x5d/0x80 [spl]
[  397.304250]  kthread+0x108/0x220
[  397.304392]  ? __pfx_kthread+0x10/0x10
[  397.304533]  ret_from_fork+0x205/0x240
[  397.304676]  ? __pfx_kthread+0x10/0x10
[  397.304820]  ret_from_fork_asm+0x1a/0x30
[  397.304967]  </TASK>
[  397.305110] Modules linked in: zfs(PO) spl(O) uas usb_storage usbkbd raid1 dm_raid hid_generic raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx usbhid xor hid raid6_pq nvme xhci_pci spi_intel_pci atlantic nvme_core ahci i2c_i801 spi_intel xhci_hcd i2c_mux intel_lpss_pci libahci i2c_smbus macsec nvme_keyring intel_lpss nvme_auth idma64 video wmi pinctrl_alderlake
[  397.305770] ---[ end trace 0000000000000000 ]---
[  397.305945] RIP: 0010:dmu_objset_userquota_get_ids+0x1a0/0x500 [zfs]
[  397.306256] Code: 45 b8 00 00 00 00 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 49 8b 44 24 18 48 8b 80 c0 02 00 00 48 8b 04 c5 00 f2 e4 c0 <ff> d0 0f 1f 00 41 89 c4 48 8b 45 b0 49 89 87 20 03 00 00 48 8b 45
[  397.306619] RSP: 0018:ffffd17bcc253b60 EFLAGS: 00010246
[  397.306796] RAX: 6972775f73746e65 RBX: 0000000000000001 RCX: 0000000000000000
[  397.306972] RDX: ffffd17bcc253b78 RSI: ffff8bba70ef9ec0 RDI: 0000000000000010
[  397.307145] RBP: ffffd17bcc253bc8 R08: 0000000000000000 R09: 0000000000000000
[  397.307313] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bba708f9800
[  397.307477] R13: 0000000000000000 R14: ffff8bba43078c00 R15: ffff8bba7101fab8
[  397.307641] FS:  0000000000000000(0000) GS:ffff8bd1f1c86000(0000) knlGS:0000000000000000
[  397.307799] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  397.307956] CR2: 0000752ff5797ff8 CR3: 000000099b83a004 CR4: 0000000000f70ef0
[  397.308112] PKRU: 55555554
 
please try to get the full boot log for that node on the problematic kernel! please also try the 6.17.9 kernel (currently making its way through the repositories).
 
  • Like
Reactions: Ramalama
please try to get the full boot log for that node on the problematic kernel! please also try the 6.17.9 kernel (currently making its way through the repositories).

Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?
 
Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?
You need the test repo.
Fabians initial recommendation with the 6.17.9 kernel is perfect, because it comes with newer zfs modules (highest chance to fix your issue)

For the bootlog, you boot into the nonworking kernel, then in the working one and just get from systemctl the previous boot log, echo it into a file and upload.
(Im writing from my phone, so you have to google the commands or chatgpt. Its easy)

For the 6.17.9 repo, you need to add simply the proxmox pve-test repo. You can even do that from the proxmox gui directly in the repository tab.

Cheers
 
Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?

netconsole or serial console are two commonly used options