Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

sockets: 4
Have you tried setting Sockets:1 and cores to your total core count for that VM.
If it achieves throughput and reliability goals, the reasons could be investigated later.
 
Last edited:
I am noticing poor performance with NVMe disk :

Random read slowdown of NVMe disk with kernel version 6.2.16-19-pve vs 6.2.16-3-pve

fio --ioengine=io_uring --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4K -numjobs=1 --iodepth=8 --runtime=10 --time_based --name=fio
6.2.16-19-pve :
Run status group 0 (all jobs): READ: bw=[B]996MiB/s (1044MB/s)[/B], 996MiB/s-996MiB/s (1044MB/s-1044MB/s), io=9960MiB (10.4GB), run=10001-10001msec
6.2.16-3-pve :
Run status group 0 (all jobs): READ: bw=[B]1573MiB/s (1650MB/s)[/B], 1573MiB/s-1573MiB/s (1650MB/s-1650MB/s), io=15.4GiB (16.5GB), run=10001-10001msec
 
Hi,
I am noticing poor performance with NVMe disk :

Random read slowdown of NVMe disk with kernel version 6.2.16-19-pve vs 6.2.16-3-pve

fio --ioengine=io_uring --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4K -numjobs=1 --iodepth=8 --runtime=10 --time_based --name=fio
6.2.16-19-pve :
Run status group 0 (all jobs): READ: bw=[B]996MiB/s (1044MB/s)[/B], 996MiB/s-996MiB/s (1044MB/s-1044MB/s), io=9960MiB (10.4GB), run=10001-10001msec
6.2.16-3-pve :
Run status group 0 (all jobs): READ: bw=[B]1573MiB/s (1650MB/s)[/B], 1573MiB/s-1573MiB/s (1650MB/s-1650MB/s), io=15.4GiB (16.5GB), run=10001-10001msec
please open a new thread for this and include more information about your setup there (e.g. disk model). This is a thread about kernel 6.5.
 
Bummer.

Tested 6.5 on one of my new SuperMicro front ends with 4x Intel Xeon Gold 6448H. VM locks up under load with CPU's stuck. I do run zfs on root with 2 Micron 5400 Pro's.

Server.
https://www.supermicro.com/en/products/system/mp/2u/sys-241e-tnrttp

VM storage is on HPE Alletra NVMe.

Back to 5.15.x and no issues.

I will be looking to test KSM and other performance issues on older hardware next.
That's disappointing. This has been an issue for a long time. If I was a paying subscriber, I'd be livid.
 
6.5.3-1-pv prevent cpu to reach c6 package states. It's stuck at c3.

This bug/change is also in proxmox 8.0.5 kernel.

My i3-13100 cpu can reach c6 state upto proxmox version 8.0.4, but brakes in 8.0.5 with 6.2.16-19-pv and does not work with 6.5 kernel either.


Have done testing with fresh install and only upgraded kernel one version at the time.
Verified using latest powertop version compiled from source to verify cpu package C-state and s-tui to monitor cpu package power consumption.
 
Last edited:
This bug/change is also in proxmox 8.0.5 kernel.
There's no such kernel version currently, do you mean the 6.2 one?
See the node's summary panel in the web-interface or use uname -a to get the currently booted kernel version. For all installed ones you can check pveversion -v.
 
There's no such kernel version currently, do you mean the 6.2 one?
See the node's summary panel in the web-interface or use uname -a to get the currently booted kernel version. For all installed ones you can check pveversion -v.
I mean the package version, which I assume is proxmox release. So it brakes with kernel 6.2.16-19-pv in proxmox version 8.0.5 and it's present in 6.5.3-1-pv kernel also.
Code:
apt list -a  pve-kernel-6.2
Listing... Done
pve-kernel-6.2/stable,now 8.0.5 all [installed,automatic]
pve-kernel-6.2/stable 8.0.4 all
pve-kernel-6.2/stable 8.0.3 all
pve-kernel-6.2/stable 8.0.2 all
pve-kernel-6.2/stable 8.0.1 all
pve-kernel-6.2/stable 8.0.0 all
 
Last edited:
I mean the package version, which I assume is proxmox release.
The Proxmox VE's main version is not tightly related to the kernel, you can boot any 6.2 and 6.5 and possibly also older kernels with pve-manager 8.0.5, so it's not telling us much w.r.t. kernel issues
So it brakes with kernel 6.2.16-19-pv in proxmox version 8.0.5 and it's present in 6.5.3-1-pv kernel also.
Ok, do you also know the latest version (6.2?) where this still worked for you?
 
The Proxmox VE's main version is not tightly related to the kernel, you can boot any 6.2 and 6.5 and possibly also older kernels with pve-manager 8.0.5, so it's not telling us much w.r.t. kernel issues

Ok, do you also know the latest version (6.2?) where this still worked for you?
Yes it worked in kernel 6.2.16-5-pve from package version 8.0.4

Used apt to upgrade kernel versions with version lock.
Code:
apt-get install --only-upgrade pve-kernel-6.2=8.0.4
 
Last edited:
Tested through the 6.2 kernels and the change which "brakes" cpu package state was introduced in proxmox-kernel-6.2.16-16-pve


proxmox-kernel-6.2.16-6-pve works
proxmox-kernel-6.2.16-8-pve works
proxmox-kernel-6.2.16-9-pve works
proxmox-kernel-6.2.16-10-pve works
proxmox-kernel-6.2.16-11-pve works
proxmox-kernel-6.2.16-12-pve works
proxmox-kernel-6.2.16-13-pve works
proxmox-kernel-6.2.16-14-pve works
proxmox-kernel-6.2.16-15-pve works
proxmox-kernel-6.2.16-16-pve Fails to reach C6
proxmox-kernel-6.2.16-17-pve Fails to reach C6
proxmox-kernel-6.2.16-18-pve Fails to reach C6

Sorry for talking about 6.2 kernel in 6.5 thread but same problem in 6.5.

EDIT: update sources to Ubuntu-6.2.0-36.36 So would I assume right this is an ubuntu kernel problem... ?
Don't see any obvious in changelog. https://www.ubuntuupdates.org/package/canonical_kernel_team/lunar/main/base/linux
 
Last edited:
proxmox-kernel-6.2.16-15-pve works
proxmox-kernel-6.2.16-16-pve Fails to reach C6
Ah great, that narrows it down quite a bit, still over 1600 commits between those two releases – can you give me some information about the HW in use here? Mainly CPU and motherboard model would be good to know here.
Sorry for talking about 6.2 kernel in 6.5 thread but same problem in 6.5.
Yeah, let's move this to a new, separate thread. Could you please create one with above information and tag my username with @?
 
SuperMicro H12DSI-N6. AMD EPYC 7642.

While everything seems to working correctly, caught a few messages in dmesg
kvm[7381]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
This pops up at boottime once.
sounds like KVM needs to be updated? https://lwn.net/Articles/918106/

workqueue: drm_fb_helper_damage_work [drm_kms_helper] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND
This pops up multiple time during bootup. I've got a nvidia GPU setup for passthru.
 
Last edited:
This pops up at boottime once.
We already added a stop-gap, reverting this warning to a single-time one with the next version of this kernel, just like usptream did too.
As there's not much to fix here, but to improve a few things that have to be done very carefully to not break existing use-cases and people migrating from older kernels.

workqueue: drm_fb_helper_damage_work [drm_kms_helper] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND
This pops up multiple time during bootup. I've got a nvidia GPU setup for passthru.
This is the work of a new detection mechanism for workloads that should be handled differently in the kernel, but cannot be easily determined upfront (or where just wrongly classified when added).
You could report this to the respective kernel devs, just like:
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg452840.html

We'll possibly disable this for the next release of v6.5, as it seems to confuse users more, and they often do not have the knowledge for reporting such things to the correct kernel maintainers, so possibly better suited for some testing kernels booted in our testlab.
 
Excellent. In that case, considering these were warnings.. I've had no serious problems from 6.5 on my server/workloads so far.
 
Ah great, that narrows it down quite a bit, still over 1600 commits between those two releases – can you give me some information about the HW in use here? Mainly CPU and motherboard model would be good to know here.

Yeah, let's move this to a new, separate thread. Could you please create one with above information and tag my username with @?
To finish this of for 6.5 aswell. The bug most likely was introduced in an ubuntu kernel patch for Realtek driver. Without looking I assume this patch is in the 6.5 kernel aswell. Atleast I do not have any issues when using my Intel i226-v nic, which became the solution.



I guess the saying don't use Realtek nic's still holds true.
 
Last edited:
To finish this of for 6.5 aswell. The bug most likely was introduced in an ubuntu kernel patch for Realtek driver. Without looking I assume this patch is in the 6.5 kernel aswell.
I re-checked the patches and it seems this is not really an unexpected regression but rather a trade-off.
Most (all, but possibly hardware revision and/or firmware dependent) of those models have issues if Active-state power management (ASPM) is enabled, i.e., after a while they crash completely requiring a reboot to get network working again.
That is (understandably) considered quite a bit worse than not having power management, and on kernel side this the only thing that could be controlled, so it was disabled again.

Ideally the HW vendor would fix this for real, and deliver that, e.g., via a firmware update.
So we won't consider this a fixable bug for the time being, it's by design to make broken HW a bit less broken, or at least basically useable.
 
  • Like
Reactions: etnicor
Kernel 6.5.3-1-pve (like the latest 6.2) does not boot reliably from a Lexar NM790 4TB SSD. Sometimes the kernel boots, sometimes the boot process aborts with "nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0", dropping to BusyBox. A patch is available since 6.5.5 in the mainline kernel. If upstream does not upgrade or backport the patch in the near future, please consider adding it to the PVE build. Multiple users already encountered this issue. Responses in the thread that this build fixed the issue are not correct (tested on NUC7i5).
FYI, the current version of the package, i.e. proxmox-kernel-6.5.11-1-pve, includes a backport of that patch.
 
I have a Dell R340 system that won't boot any of the testing 6.5 kernels. It previously booted fine with all of the 5.15 series and boots fine with 6.2.16-19-pve and previous 6.2 kernels. The install is a zfs root mirrored pair of sata drives on a LSI sas controller. It halts at "EFI Stub: Loaded initrd from command line option".
 
I have a Dell R340 system that won't boot any of the testing 6.5 kernels. It previously booted fine with all of the 5.15 series and boots fine with 6.2.16-19-pve and previous 6.2 kernels. The install is a zfs root mirrored pair of sata drives on a LSI sas controller. It halts at "EFI Stub: Loaded initrd from command line option".

I had similar issues. Eventually it boots past the EFI stub, but then scrolls endless mptsas3 errors, presumably related to the LSI SAS HBA. Rock solid on ZFS 2.1.13 and kernel 6.2. I rolled everything back and was fine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!