Opt-in Linux 7.0 Kernel for Proxmox VE 9 available

I've seen similar issues but with a realtek nic, I just figured it was realtek being realtek and turned off checksum offloading.
See the bug, I can reproduce this on Broadcom and Intel NICs as well so I think its a wider problem. The problem exists when newer linux kernel features in 6.16+ interact with new Qemu versions 11.0+ which introduce new offloading scenarios. I don't know why these break if its a problem in Qemu or the host Kernel though.
 
Does choosing a lower machine version prevent these issues? 11.0 was only released late april, it seems a bit too new.
 
Hi,

with QEMU 10.2, there was a switch to using io_uring for the IO thread event loops and the IO pressure/wait accounting is set via the io_uring subsystem now. It's a different kernel subsystem from before, so it's not unexpected if it's different.

So, if I understood correctly, this is just a different way the graph is calculated after the update, and it does not necessarily mean that performance is affected, right?
 
So, if I understood correctly, this is just a different way the graph is calculated after the update, and it does not necessarily mean that performance is affected, right?
Yes. To be precise: a different way the IO wait metric is calculated.
 
  • Like
Reactions: Pcom
I am trying to install the NVidia host grid drivers on 7.0.2-7-pve and I am getting this error:

fatal error: os-interface.h: No such file or directory

I have these installed:

proxmox-headers-6.17.13-12-pve
proxmox-headers-7.0.2-7-pve

What am I missing?
 
I am trying to install the NVidia host grid drivers on 7.0.2-7-pve and I am getting this error:

fatal error: os-interface.h: No such file or directory

I have these installed:

proxmox-headers-6.17.13-12-pve
proxmox-headers-7.0.2-7-pve

What am I missing?
Which driver version? This could indicate incompatibility between driver and kernel version.
 
Last edited:
Thanks for that, it helped some but still not working. Looks like I will need to reinstall and configure proxmox back to version 8.
 
Is this the correct way to set it to force TSC?

nano /etc/default/grub

and than change the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet clocksource=tsc tsc=reliable"

Is there any risk to set this? Do I risk the host not booting at all?

Forcing my PVE host to use TSC instead of HPET did resolve the issue with the idle power usage. Host has been running now for 58 hours without any issues.

What can be the reason the newer kernels do not except the TSC anymore from certain AMD CPU's? Is there certain UEFI settings we need to set in order for the kernel to properly accept TSC by itself? Previous kernels did not have this issue and I did not need to force TSC, no settings in firmware have been changed in the meantime.

My CPU is a AMD Ryzen 5 5560U.

Steps I have followed to force the kernel to sue TSC:

1) To confirm if your host is using TSC or HPET run command: cat /sys/devices/system/clocksource/clocksource0/current_clocksource
1) Run command: nano /etc/default/grub
2) Change the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet" to: GRUB_CMDLINE_LINUX_DEFAULT="quiet clocksource=tsc tsc=reliable"
3) Ctrl+O then Ctrl+X to update and close the GRUB file.
4) Run command: update-grub
5) Run command: update-initramfs -u -k all
6) Reboot host
7) To confirm host is using TSC now run command: cat /sys/devices/system/clocksource/clocksource0/current_clocksource
 
Kernel 7.0.x regression: ACS violation on PCIe port 80:1b.4 causes VM freeze — Arrow Lake + Thunderbolt 5 + Intel Arc B50 passthrough

Hi all, posting to report a clear regression on kernel 7.0.x affecting my Windows 11 VM with PCIe passthrough. Works perfectly on 6.17.13-13-pve, broken on all 7.0.x kernels tested.
Hardware
  • CPU: Intel Arrow Lake-S
  • Chipset: Intel 800 Series PCH
  • GPU (passthrough): Intel Arc Pro B50 (Battlemage G21, 04:00.0)
  • Thunderbolt: Intel JHL9580 Thunderbolt 5 Barlow Ridge (84:00.0 / 97:00.0)
  • USB controller (passthrough): ASMedia ASM3242 USB 3.2 (06:00.0)
  • NVMe (passthrough): Samsung S4LV008 Pascal (81:00.0)

Proxmox version: 9.2.3

Kernel boot parameters: intel_iommu=on iommu=pt split_lock_detect=off

Symptom
The Windows 11 VM freezes shortly after the Arc GPU driver initialises on boot. Display output is lost, CPU pegs at 100%, and the VM becomes completely unresponsive. The host immediately enters a continuous AER error loop on PCIe root port 0000:80:1b.4 which persists even after force-stopping the VM and requires a full host reboot to clear. Keyboard and mouse (connected via the passed-through ASMedia USB controller) also stop responding at the point of freeze.

Root cause
The AER loop is triggered by an ACS violation on PCIe root port 80:1b.4 (device 8086:7f44), which is the upstream port for the Thunderbolt 5 subsystem. The violation appears to be triggered by DMA activity when the Arc GPU driver initialises, crossing an ACS boundary between sibling root ports 80:1b.0 and 80:1b.4.

dmesg (kernel 7.0.6-2-pve)

Code:
[  436.209061] pcieport 0000:80:1b.4: AER: Correctable error message received from 0000:80:1b.4
[  436.209134] pcieport 0000:80:1b.4:   device [8086:7f44] error status/mask=00300000/00000000
[  436.209138] pcieport 0000:80:1b.4:    [20] UnsupReq
[  436.209140] pcieport 0000:80:1b.4:    [21] ACSViol (First)
[  437.238805] thunderbolt 0000:84:00.0: AER: can't recover (no error_detected callback)
[  437.238815] xhci_hcd 0000:97:00.0: AER: can't recover (no error_detected callback)
[  437.238832] pcieport 0000:80:1b.4: AER: device recovery failed
... (repeats continuously until host reboot)

Kernels tested
  • 6.17.13-13-pve — VM boots and runs normally
  • 7.0.2-6-pve — freeze, ACS violation loop
  • 7.0.2-7-pve — freeze, ACS violation loop
  • 7.0.6-2-pve — freeze, ACS violation loop
Workaround
Pinned to 6.17.13-13-pve which resolves the issue completely.

Happy to provide any additional diagnostic output if helpful.
 
could you open a new thread and post the data you posted here + "lspci -v"?