Opt-in Linux 7.0 Kernel for Proxmox VE 9 available on test and no-subscription

Just updated PVE from 9.1.6 to 9.1.9 and to kernel 7.0.0.3 and got kernel panic.

I pinned the kernel to 6.17.13-2-pve for now and I'm back to business.

I'm happy to provide details to help to debug .Just tell me you need.
please open a new thread and provide the full log, thanks!
 
I updated to Kernel 7.0.0.3 I have the Problem when 2 VMS with UEFI and Mapped PCIE devices are start the 2nd VM will kill the first one. Meanwhile other vms can be started and run without problems. I reverted to 6.17 and all went fine again. II attached the log with the 7.0.0.3 kernel - its seems to be a repeating pattern.

I see some memory limit error but with 6.17. I still have over 30GB ram free and 700GB on the SSD

I am using:
AMD Epyc 8434P
ASRock Mainboard SIENAD8-2L2T
Broadcom HBA 9500-8i
So i tried kernel 7.0.2-2 but the problem still persists.
 
I think I met the same isseu like yours, all VMs are running with maximum cpu, then make host run with maximum cpu, it make the pve machine unstable, I have to pin 6.17 as boot kernel.

And I have another pve machine with 3090 GPU, the GPU cannot be detected on kernel 7.0, with the same driver installation switching to 6.17, then GPU is running well.

I have 5 pve machines which have upgraded to kernel 7.0, above 2 meet those issues, another 3 are good.
A update for my 2 issues.
Both of those 2 machine were using 2021/22 BIOS Version, so I have upgrade both BIOS to the latest version.

Now, high cpu usage on kernel 7.0 issue is gone. and I also update kernel to 7.0.2-2-pve, and the performance is goold.

But for kernel 7.0 does not detect GPU issue, it is still here. Is it driver issue? Maybe not, I have one usecase on another PVE, that is passthrough RTX 2000 to an ubuntu 26.04 VM, it is also kernel 7.0, and the RTX 2000 works well.
The different of the driver:
PVE installs the driver from nvidia "*.run" installer, the ubunu install driver from "ubuntu-drivers" installer.
 
A follow up to my and @Gnosh reports about the issue with Tailscale asymmetric performance and Kernel 7.

The issue is easily tested and reproducible with iperf3 across Tailscale. I have reproduced it both within my local network and across interstate connections.

The specific tests below tests are run with exactly the same hardware, same network, same Tailscale version - i.e the _only_ change is the kernel version.

Kernel 7.0.2-2-pve - clear, massive regression:

Code:
10:20 user@samba:~ > iperf3 -c 100.93.240.99 -t 30
Connecting to host 100.93.240.99, port 5201
[  5] local 100.125.133.11 port 50050 connected to 100.93.240.99 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   384 KBytes  3.14 Mbits/sec   52   2.40 KBytes
[  5]   1.00-2.00   sec   640 KBytes  5.24 Mbits/sec   40   2.40 KBytes
[  5]   2.00-3.00   sec   640 KBytes  5.24 Mbits/sec   46   2.40 KBytes
[  5]   3.00-4.00   sec   256 KBytes  2.10 Mbits/sec   29   1.20 KBytes
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec   29   2.40 KBytes
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  16.0 MBytes  4.47 Mbits/sec  1200            sender
[  5]   0.00-30.00  sec  16.0 MBytes  4.47 Mbits/sec                  receiver
iperf Done.
10:20 user@samba:~ > iperf3 -c 100.93.240.99 -t 30 -R
Connecting to host 100.93.240.99, port 5201
Reverse mode, remote host 100.93.240.99 is sending
[  5] local 100.125.133.11 port 45092 connected to 100.93.240.99 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   150 MBytes  1.26 Gbits/sec
[  5]   1.00-2.00   sec   148 MBytes  1.24 Gbits/sec
[  5]   2.00-3.00   sec   153 MBytes  1.28 Gbits/sec
[  5]   3.00-4.00   sec   146 MBytes  1.22 Gbits/sec
[  5]   4.00-5.00   sec   155 MBytes  1.30 Gbits/sec
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  4.27 GBytes  1.22 Gbits/sec   16            sender
[  5]   0.00-30.00  sec  4.26 GBytes  1.22 Gbits/sec                  receiver
iperf Done.

Vs. Kernel 6.17.13-7-pve results:

Code:
10:49 user@samba:~ > iperf3 -c 100.93.240.99 -t 5
Connecting to host 100.93.240.99, port 5201
[  5] local 100.125.133.11 port 57468 connected to 100.93.240.99 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  64.2 MBytes   539 Mbits/sec  152    562 KBytes
[  5]   1.00-2.00   sec  64.0 MBytes   537 Mbits/sec    1    631 KBytes
[  5]   2.00-3.00   sec  58.9 MBytes   494 Mbits/sec    0    690 KBytes
[  5]   3.00-4.00   sec  61.4 MBytes   515 Mbits/sec   61    534 KBytes
[  5]   4.00-5.00   sec  51.6 MBytes   432 Mbits/sec    2    597 KBytes
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   300 MBytes   503 Mbits/sec  216            sender
[  5]   0.00-5.01   sec   298 MBytes   498 Mbits/sec                  receiver

iperf Done.
10:49 user@samba:~ > iperf3 -c 100.93.240.99 -t 5 -R
Connecting to host 100.93.240.99, port 5201
Reverse mode, remote host 100.93.240.99 is sending
[  5] local 100.125.133.11 port 54136 connected to 100.93.240.99 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   150 MBytes  1.26 Gbits/sec
[  5]   1.00-2.00   sec   150 MBytes  1.26 Gbits/sec
[  5]   2.00-3.00   sec   138 MBytes  1.16 Gbits/sec
[  5]   3.00-4.00   sec   104 MBytes   870 Mbits/sec
[  5]   4.00-5.00   sec   153 MBytes  1.28 Gbits/sec
..
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   698 MBytes  1.17 Gbits/sec   50            sender
[  5]   0.00-5.00   sec   695 MBytes  1.17 Gbits/sec                  receiver

iperf Done.

Issue affects multiple NICs with at least these reported so far:
AQtion AQC113CS
Aquantia Corp. AQC113C NBase-T/IEEE 802.3an Ethernet Controller [Marvell Scalable mGig] (rev 03)

& mine:
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034

Network configuration:
  • Proxmox host uses a standard Linux bridge (vmbr0) — no SR-IOV, no VLAN-aware bridge, etc
  • Physical NIC (nic2, Mellanox ConnectX-4 Lx, mlx5_core) is a bridge member of vmbr0
  • LXC containers connect via veth pairs through the bridge, with Proxmox firewall enabled (fwbr/fwln/fwpr chain)
  • Tailscale runs inside the LXC (not on the Proxmox host), so WireGuard UDP packets egress via: tailscale0 (LXC) → veth → fwbr → vmbr0 → nic2 (mlx5_core) → physical
  • lxc.mount.entry: /dev/net/tun passthrough (required for Tailscale in LXC)
  • Proxmox firewall enabled on the LXC interface (firewall=1 in LXC config) - issue still occurs with firewall disabled, though
  • LXC is unprivileged, Debian 13, running basically nothing but samba and Tailscale (also effects my Jellyfin container).

The issue is present and reproducible with at least these kernels:
7.0.0-3-pve
7.0.2-2-pve

The issue is NOT present with all prior kernels including (i.e. the workaround is to pin to an older kernel):
6.17.13-6-pve
6.17.13-7-pve

I have done extensive testing and pretty much ruled out:
ECN, congestion control, TSO/GSO/GRO, tunnel offloads, conntrack, router and NIC firmware, ISP issues


It would be great to get some eyes on this, and I am happy to run tests/supply logs etc.
 
Last edited: