I'm a relative novice with this stuff, so please excuse this noise if I am wrong, but that ^ smells a lot like: https://forum.proxmox.com/threads/a...-with-kernel-7-and-multiple-nic-types.183574/
Could be related but tailscale is specifically UDP and not TCP so similar problems with offloading but a different datapath. Could just be that the whole off the offloading featureset is somewhat broken in some NIC combinations.I'm a relative novice with this stuff, so please excuse this noise if I am wrong, but that ^ smells a lot like: https://forum.proxmox.com/threads/a...-with-kernel-7-and-multiple-nic-types.183574/
I've seen similar issues but with a realtek nic, I just figured it was realtek being realtek and turned off checksum offloading.Leaving a link to this here as well so others looking for problems can see it: https://bugzilla.proxmox.com/show_bug.cgi?id=7627
TCP checksum offloading for virtio is broken for at least some NIC types on latest kernel + qemu for later linux guests / windows guests.
See the bug, I can reproduce this on Broadcom and Intel NICs as well so I think its a wider problem. The problem exists when newer linux kernel features in 6.16+ interact with new Qemu versions 11.0+ which introduce new offloading scenarios. I don't know why these break if its a problem in Qemu or the host Kernel though.I've seen similar issues but with a realtek nic, I just figured it was realtek being realtek and turned off checksum offloading.
Hi,
with QEMU 10.2, there was a switch to using io_uring for the IO thread event loops and the IO pressure/wait accounting is set via the io_uring subsystem now. It's a different kernel subsystem from before, so it's not unexpected if it's different.
Yes. To be precise: a different way the IO wait metric is calculated.So, if I understood correctly, this is just a different way the graph is calculated after the update, and it does not necessarily mean that performance is affected, right?
Which driver version? This could indicate incompatibility between driver and kernel version.I am trying to install the NVidia host grid drivers on 7.0.2-7-pve and I am getting this error:
fatal error: os-interface.h: No such file or directory
I have these installed:
proxmox-headers-6.17.13-12-pve
proxmox-headers-7.0.2-7-pve
What am I missing?
Linux 6.15 or newer has no support for the EXTRA_CFLAGS variable in out-of-tree module Kbuild files, needed for 550.144.02.550.144.02 looks like the latest version a P100 and V100 supports.
Is this the correct way to set it to force TSC?
nano /etc/default/grub
and than change the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet clocksource=tsc tsc=reliable"
Is there any risk to set this? Do I risk the host not booting at all?
[ 436.209061] pcieport 0000:80:1b.4: AER: Correctable error message received from 0000:80:1b.4
[ 436.209134] pcieport 0000:80:1b.4: device [8086:7f44] error status/mask=00300000/00000000
[ 436.209138] pcieport 0000:80:1b.4: [20] UnsupReq
[ 436.209140] pcieport 0000:80:1b.4: [21] ACSViol (First)
[ 437.238805] thunderbolt 0000:84:00.0: AER: can't recover (no error_detected callback)
[ 437.238815] xhci_hcd 0000:97:00.0: AER: can't recover (no error_detected callback)
[ 437.238832] pcieport 0000:80:1b.4: AER: device recovery failed
... (repeats continuously until host reboot)
Do you have any new updates or solutions other than reverting to the previous kernel version?I can confirm similar behavior on a 4-node cluster running Proxmox VE 9.2.2.
Cluster hardware:
Only the EPYC 3251 node is affected.
- 2x Intel Xeon E3-1220L v2
- 1x AMD EPYC 7551P
- 1x AMD EPYC 3251
Symptoms:
Important observations:
- Progressive performance degradation after ~2 days uptime on kernel 7.0.2-6-pve
- CPU usage gradually rises until the host reaches nearly 100% system CPU usage
- High load average with almost no IO wait
- All KVM guests are affected equally
- Host becomes nearly unusable
Additional notes:
- Current clocksource is already tsc
- read_hpet usage is minimal (~1%)
- RAM, swap and IO usage remain normal
- The issue appears related to virtualization syscalls / context switching / scheduler activity
- powertop shows very high tick_nohz_handler, sched(softirq) and APIC timer activity
- dbs_work_handler activity is also unusually high
Downgrading back to 6.8.12-15-pve restores normal behavior.
- The EPYC 7551P node running the same Proxmox/kernel version does NOT show the issue
- Changing CPU governor from ondemand to performance did not solve the problem
- Issue seems specific to the EPYC 3251 embedded platform
View attachment 97974
echo scan-time > /sys/kernel/mm/ksm/advisor_mode
CPA: called for zero pte. vaddr = ffffffffc1e4c000 cpa->vaddr = ffffffffc1e4c000
WARNING: arch/x86/mm/pat/set_memory.c:1821 at __cpa_process_fault+0x6a4/0x6f0, CPU#16: modprobe/1249
CPU: 16 PID: 1249 Comm: modprobe Tainted: P D O 7.0.12-1-pve #1
Hardware name: ASUS ROG CROSSHAIR VIII FORMULA, BIOS 5002 01/13/2025
RIP: 0010:__cpa_process_fault+0x6ae/0x6f0
Call Trace:
__change_page_attr_set_clr+0xaca/0x1000
change_page_attr_set_clr+0x106/0x1b0
set_memory_nx+0x4e/0x70
execmem_alloc_rw+0x31/0x70
load_module+0x7a1/0x2150
init_module_from_file+0xfd/0x160
idempotent_init_module+0x110/0x300
__x64_sys_finit_module+0x73/0xf0
do_syscall_64+0x10b/0x14e0
proxmox-boot-tool kernel pin 6.17.13-13-pve
Hit a hard regression on 7.0.12-1-pve on a Ryzen 9 5950X box - after a routine reboot, no VM or container would start. `qm start` returned without error but guests stayed `stopped`, and load sat at ~8 with nothing actually running.
Root cause is a WARNING storm in the execmem cache during module loading. The first hit is on the NFS/sunrpc modules at boot, then it repeats on the firewall modules (nf_tables/ip_set/iptable_filter), leaving each modprobe stuck in uninterruptible D-state. That cascades into:
- pve-firewall: can't lock file '/run/lock/pvefw.lck' - got timeout
- ha-manager status: lrm <node> (old timestamp - dead?), all HA services in `freeze`
- net result: nothing will start
The WARNING fired 85,535 times on that single boot. The kernel was already tainted D (DIE) / W (WARN).
Code:CPA: called for zero pte. vaddr = ffffffffc1e4c000 cpa->vaddr = ffffffffc1e4c000 WARNING: arch/x86/mm/pat/set_memory.c:1821 at __cpa_process_fault+0x6a4/0x6f0, CPU#16: modprobe/1249 CPU: 16 PID: 1249 Comm: modprobe Tainted: P D O 7.0.12-1-pve #1 Hardware name: ASUS ROG CROSSHAIR VIII FORMULA, BIOS 5002 01/13/2025 RIP: 0010:__cpa_process_fault+0x6ae/0x6f0 Call Trace: __change_page_attr_set_clr+0xaca/0x1000 change_page_attr_set_clr+0x106/0x1b0 set_memory_nx+0x4e/0x70 execmem_alloc_rw+0x31/0x70 load_module+0x7a1/0x2150 init_module_from_file+0xfd/0x160 idempotent_init_module+0x110/0x300 __x64_sys_finit_module+0x73/0xf0 do_syscall_64+0x10b/0x14e0
Environment
- proxmox-ve: 9.2.0, pve-manager: 9.2.3, kernel 7.0.12-1-pve
- pve-firewall: 6.0.4, pve-ha-manager: 5.2.4, qemu-server: 9.1.17, zfs 2.4.2-pve1
- AMD Ryzen 9 5950X, ASUS ROG Crosshair VIII Formula, BIOS 5002
- Repo: pve-no-subscription
Workaround
Pinned a 6.17 kernel and rebooted. A clean `systemctl reboot` hangs on the D-state tasks, so I used SysRq (s, u, b) - all guests were already down, so it was safe:
Code:proxmox-boot-tool kernel pin 6.17.13-13-pve
6.17.13-13-pve boots clean and every guest starts normally.
The trace lines up with the known upstream execmem cache rework in x86/module (WARNINGs in arch/x86/mm/pat/set_memory.c). Is 7.0.12-1-pve expected to carry an execmem fix, or should it be held back for now? Happy to file on Bugzilla and attach the full `journalctl -b` if useful.
We use essential cookies to make this site work, and optional cookies to enhance your experience.