Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription

try the 6.17.9 kernel and please report back!

I booted with the 6.17.9-1 kernel and it looks like it is working, thanks!

I did earlier try nvme_core.default_ps_max_latency_us=0, but I had no apparent change from it.
 
  • Like
Reactions: fabian
I just did the 8 to 9 upgrade on 6 servers and 1 of them refused to boot with the 6.17 kernel. All 6 servers have the root fs on zfs using a mirror. However with this 1 bad server, when it tries to mount rpool, it dies and just hangs. The 6.8 and 6.14 kernels boot on the same server without any issues.

I was able to boot into the initramfs shell and manually mount rpool read-only, but manually mounting it read-write output the following to dmesg:

Code:
[  396.156210] spl: loading out-of-tree module taints kernel.
[  396.177606] zfs: module license 'CDDL' taints kernel.
[  396.177609] Disabling lock debugging due to kernel taint
[  396.177620] zfs: module license taints kernel.
[  396.557045] ZFS: Loaded module v2.3.4-pve1, ZFS pool version 5000, ZFS filesystem version 5
[  397.299048] Oops: general protection fault, probably for non-canonical address 0x6972775f73746e65: 0000 [#1] SMP NOPTI
[  397.299188] CPU: 14 UID: 0 PID: 761 Comm: txg_sync Tainted: P S         O        6.17.4-2-pve #1 PREEMPT(voluntary)
[  397.299258] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
[  397.299327] Hardware name: Gigabyte Technology Co., Ltd. Z790 S WIFI DDR4/Z790 S WIFI DDR4, BIOS F1 02/06/2024
[  397.299400] RIP: 0010:dmu_objset_userquota_get_ids+0x1a0/0x500 [zfs]
[  397.299645] Code: 45 b8 00 00 00 00 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 49 8b 44 24 18 48 8b 80 c0 02 00 00 48 8b 04 c5 00 f2 e4 c0 <ff> d0 0f 1f 00 41 89 c4 48 8b 45 b0 49 89 87 20 03 00 00 48 8b 45
[  397.299810] RSP: 0018:ffffd17bcc253b60 EFLAGS: 00010246
[  397.299897] RAX: 6972775f73746e65 RBX: 0000000000000001 RCX: 0000000000000000
[  397.299986] RDX: ffffd17bcc253b78 RSI: ffff8bba70ef9ec0 RDI: 0000000000000010
[  397.300077] RBP: ffffd17bcc253bc8 R08: 0000000000000000 R09: 0000000000000000
[  397.300169] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bba708f9800
[  397.300263] R13: 0000000000000000 R14: ffff8bba43078c00 R15: ffff8bba7101fab8
[  397.300359] FS:  0000000000000000(0000) GS:ffff8bd1f1c86000(0000) knlGS:0000000000000000
[  397.300458] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  397.300562] CR2: 0000752ff5797ff8 CR3: 000000099b83a004 CR4: 0000000000f70ef0
[  397.300666] PKRU: 55555554
[  397.300770] Call Trace:
[  397.300876]  <TASK>
[  397.300981]  dnode_setdirty+0x34/0x110 [zfs]
[  397.301235]  dbuf_dirty+0x824/0x950 [zfs]
[  397.301474]  ? zio_create+0x406/0x690 [zfs]
[  397.301727]  dmu_buf_will_dirty_impl+0xa0/0x270 [zfs]
[  397.301975]  dmu_buf_will_dirty+0x16/0x30 [zfs]
[  397.302218]  dsl_dataset_sync+0x26/0x200 [zfs]
[  397.302484]  dsl_pool_sync+0xa9/0x4e0 [zfs]
[  397.302755]  spa_sync+0x561/0x1070 [zfs]
[  397.303029]  ? spa_txg_history_init_io+0x11c/0x130 [zfs]
[  397.303296]  txg_sync_thread+0x209/0x3b0 [zfs]
[  397.303559]  ? try_to_wake_up+0x392/0x8a0
[  397.303696]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[  397.303961]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[  397.304106]  thread_generic_wrapper+0x5d/0x80 [spl]
[  397.304250]  kthread+0x108/0x220
[  397.304392]  ? __pfx_kthread+0x10/0x10
[  397.304533]  ret_from_fork+0x205/0x240
[  397.304676]  ? __pfx_kthread+0x10/0x10
[  397.304820]  ret_from_fork_asm+0x1a/0x30
[  397.304967]  </TASK>
[  397.305110] Modules linked in: zfs(PO) spl(O) uas usb_storage usbkbd raid1 dm_raid hid_generic raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx usbhid xor hid raid6_pq nvme xhci_pci spi_intel_pci atlantic nvme_core ahci i2c_i801 spi_intel xhci_hcd i2c_mux intel_lpss_pci libahci i2c_smbus macsec nvme_keyring intel_lpss nvme_auth idma64 video wmi pinctrl_alderlake
[  397.305770] ---[ end trace 0000000000000000 ]---
[  397.305945] RIP: 0010:dmu_objset_userquota_get_ids+0x1a0/0x500 [zfs]
[  397.306256] Code: 45 b8 00 00 00 00 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 49 8b 44 24 18 48 8b 80 c0 02 00 00 48 8b 04 c5 00 f2 e4 c0 <ff> d0 0f 1f 00 41 89 c4 48 8b 45 b0 49 89 87 20 03 00 00 48 8b 45
[  397.306619] RSP: 0018:ffffd17bcc253b60 EFLAGS: 00010246
[  397.306796] RAX: 6972775f73746e65 RBX: 0000000000000001 RCX: 0000000000000000
[  397.306972] RDX: ffffd17bcc253b78 RSI: ffff8bba70ef9ec0 RDI: 0000000000000010
[  397.307145] RBP: ffffd17bcc253bc8 R08: 0000000000000000 R09: 0000000000000000
[  397.307313] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bba708f9800
[  397.307477] R13: 0000000000000000 R14: ffff8bba43078c00 R15: ffff8bba7101fab8
[  397.307641] FS:  0000000000000000(0000) GS:ffff8bd1f1c86000(0000) knlGS:0000000000000000
[  397.307799] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  397.307956] CR2: 0000752ff5797ff8 CR3: 000000099b83a004 CR4: 0000000000f70ef0
[  397.308112] PKRU: 55555554
 
please try to get the full boot log for that node on the problematic kernel! please also try the 6.17.9 kernel (currently making its way through the repositories).
 
  • Like
Reactions: Ramalama
please try to get the full boot log for that node on the problematic kernel! please also try the 6.17.9 kernel (currently making its way through the repositories).

Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?
 
Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?
You need the test repo.
Fabians initial recommendation with the 6.17.9 kernel is perfect, because it comes with newer zfs modules (highest chance to fix your issue)

For the bootlog, you boot into the nonworking kernel, then in the working one and just get from systemctl the previous boot log, echo it into a file and upload.
(Im writing from my phone, so you have to google the commands or chatgpt. Its easy)

For the 6.17.9 repo, you need to add simply the proxmox pve-test repo. You can even do that from the proxmox gui directly in the repository tab.

Cheers
 
Any suggestions on getting the full boot log? Since it locks up before rpool is mounted nothing is permanently logged and the server is headless. I do have it hooked up to PiKVM and tried screen capturing the boot process, but when the lockup happens there are too many lines logged at once so I think the low frame rate misses some stuff.

And what repos do I need to add to install the 6.17.9?

netconsole or serial console are two commonly used options
 
No luck on 6.17.9, seems like it's the same issue. I'll try and set up netconsole or dig up some USB serial cables to capture the errors.
 
thanks! that seems very odd. can you run a zpool scrub using one of the kernel versions that boot?
 
Hardware name: Gigabyte Technology Co., Ltd. Z790 S WIFI DDR4/Z790 S WIFI DDR4, BIOS F1 02/06/2024
Maybe try updating to a more recent BIOS version - as shown here?
Six newer updates seem to have been released, since your F1 version of Feb 16, 2024, which is the initial BIOS release for that board.
 
zfs scrub and BIOS update did not help.

Although the BIOS update seemed to mess up my secure boot so I had to disable it for now (I'm pretty sure I had it on prior). I'll go back later and try and reset the keys and to turn it back on.
 
I don't think it's a abi issue then. You could allow access with:
Code:
/run/systemd/journal/dev-log r,
inside the profile. EDIT: the file for the profile is inside the container, namely /etc/apparmor.d/usr.sbin.rsyslogd
Worked for me with an Ubuntu 24.04. LXC! Thanks!
 
I'm having an issue with SR-IOV and the 6.17.9-1 kernel. This issue is not present in 6.17.4-2. I've attached dmesg output showing initialization and VF setup (success on 6.17.4 and failure on 6.17.9) for my Arc Pro B50, as well as lspci output for the device. I don't know if this is an issue with the xe driver in the kernel or some other subsystem but there's definitely some kind of regression.
 

Attachments

  • Like
Reactions: SInisterPisces
I'm having an issue with SR-IOV and the 6.17.9-1 kernel. This issue is not present in 6.17.4-2. I've attached dmesg output showing initialization and VF setup (success on 6.17.4 and failure on 6.17.9) for my Arc Pro B50, as well as lspci output for the device. I don't know if this is an issue with the xe driver in the kernel or some other subsystem but there's definitely some kind of regression.
Apparently in the newer kernel, you need to Enable Resizable BAR in your BIOS.

In 6.17.4-2 the driver attempts to resize BAR 2 from 256 MiB to 16 GiB but fails (due to lack of ReBAR support) and falls back gracefully to the original assignment: BAR 2 at 0x3fe00000000-0x3fe0fffffff (256 MiB).

That is why is works on the older Kernel.

In 6.17.9-1, the driver releases not only BAR 2 but also BAR 0, VF BAR 0, and VF BAR 2 during the resize attempt. It fails to assign larger regions due to "no space" (-ENOSPC), then assigns fragmented/smaller regions (e.g., BAR 0 at 0x3fa00000000-0x3fa00ffffff (16 MiB), BAR 2 at 0x3fa10000000-0x3fa1fffffff (256 MiB), VF BAR 0 at 0x3fa01000000-0x3fa02ffffff (32 MiB)). VF BAR 2 fails to assign a large enough region. SR-IOV provisioning fails with "not enough MMIO resources for SR-IOV" and "-ENOMEM", preventing VF enablement.

The users in https://forum.level1techs.com/t/intel-arc-pro-b50-sr-iov-and-me/236473/13 seem to imply that Resizable BAR in your BIOS needs to be enabled.


Edit: Is seem there is a thread that is all about Arc Pro B50 on Proxmox: https://forum.level1techs.com/t/pro...y-its-almost-here-early-adopters-guide/238107
 
Last edited:
Apparently in the newer kernel, you need to Enable Resizable BAR in your BIOS.

Alas, this system (Dell R730) does not have reBAR support.

In 6.17.4-2 the driver attempts to resize BAR 2 from 256 MiB to 16 GiB but fails (due to lack of ReBAR support) and falls back gracefully to the original assignment: BAR 2 at 0x3fe00000000-0x3fe0fffffff (256 MiB).

That is why is works on the older Kernel.

In 6.17.9-1, the driver releases not only BAR 2 but also BAR 0, VF BAR 0, and VF BAR 2 during the resize attempt. It fails to assign larger regions due to "no space" (-ENOSPC), then assigns fragmented/smaller regions (e.g., BAR 0 at 0x3fa00000000-0x3fa00ffffff (16 MiB), BAR 2 at 0x3fa10000000-0x3fa1fffffff (256 MiB), VF BAR 0 at 0x3fa01000000-0x3fa02ffffff (32 MiB)). VF BAR 2 fails to assign a large enough region. SR-IOV provisioning fails with "not enough MMIO resources for SR-IOV" and "-ENOMEM", preventing VF enablement.

Ah, thank you for the explanation there. That makes sense. Hm, I'll have to see what parameters are available for the xe driver. Maybe this behavior can be altered. Otherwise I'll hunt down where to file bug reports with Intel (I don't have high hopes for that though).

The users in https://forum.level1techs.com/t/intel-arc-pro-b50-sr-iov-and-me/236473/13 seem to imply that Resizable BAR in your BIOS needs to be enabled.


Edit: Is seem there is a thread that is all about Arc Pro B50 on Proxmox: https://forum.level1techs.com/t/pro...y-its-almost-here-early-adopters-guide/238107

Thanks, yeah, I've been following those threads. Some have had success without reBAR support. It seems as though if it was previously able to fail gracefully and provision the VFs anyway, that reBAR isn't necessary for SR-IOV. If you want decent graphical performance from the card it definitely is though. I have another Arc card in this system to accelerate video decoding and it works fine (PCI passthrough).
 
  • Like
Reactions: SInisterPisces
Comparing the dmesg output a bit more I see:
6.17.4:
Code:
[    9.791439] xe 0000:85:00.0: [drm] Failed to resize BAR2 to 16384M (-ENOENT). Consider enabling 'Resizable BAR' support in your BIOS

6.17.9:
Code:
[    9.781156] xe 0000:85:00.0: [drm] Failed to resize BAR2 to 16384M (-ENOSPC). Consider enabling 'Resizable BAR' support in your BIOS

This is interesting. ENOENT vs ENOSPC. They both fail the BAR resize but with different errors. I don't know the significance of that, but it is a possible data point.
 
  • Like
Reactions: SInisterPisces
I've been using SR-IOV with the StrongTZ patched i915 driver on a 12th gen (12700T) HP Elite Mini G9.

It's been working great, but my limited experimentation trying to switch to the Xe driver has failed. I haven't checked dmesg with Xe enabled, but I'm guessing the lack of ReBAR support in my BIOS is a problem.

Unfortunately, HP has been ignoring requests to enable turning on ReBAR support in the BIOS on these machines for at least a year, and probably longer than that.