Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

patch · Nov 9, 2023

adamb said:
sockets: 4

Have you tried setting Sockets:1 and cores to your total core count for that VM.
If it achieves throughput and reliability goals, the reasons could be investigated later.

adamb · Nov 9, 2023

patch said:
Have you tried setting Sockets:1 and cores to your total core count for that VM.
If it achieves throughput and reliability goals, the reasons could be investigated later.

That I have, no changes.

Tabalugach · Nov 10, 2023

I am noticing poor performance with NVMe disk :

Random read slowdown of NVMe disk with kernel version 6.2.16-19-pve vs 6.2.16-3-pve

fio --ioengine=io_uring --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4K -numjobs=1 --iodepth=8 --runtime=10 --time_based --name=fio
6.2.16-19-pve :
Run status group 0 (all jobs): READ: bw=[B]996MiB/s (1044MB/s)[/B], 996MiB/s-996MiB/s (1044MB/s-1044MB/s), io=9960MiB (10.4GB), run=10001-10001msec
6.2.16-3-pve :
Run status group 0 (all jobs): READ: bw=[B]1573MiB/s (1650MB/s)[/B], 1573MiB/s-1573MiB/s (1650MB/s-1650MB/s), io=15.4GiB (16.5GB), run=10001-10001msec

fiona · Nov 10, 2023

Hi,

Tabalugach said:
I am noticing poor performance with NVMe disk :

Random read slowdown of NVMe disk with kernel version 6.2.16-19-pve vs 6.2.16-3-pve

fio --ioengine=io_uring --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4K -numjobs=1 --iodepth=8 --runtime=10 --time_based --name=fio
6.2.16-19-pve :
Run status group 0 (all jobs): READ: bw=[B]996MiB/s (1044MB/s)[/B], 996MiB/s-996MiB/s (1044MB/s-1044MB/s), io=9960MiB (10.4GB), run=10001-10001msec
6.2.16-3-pve :
Run status group 0 (all jobs): READ: bw=[B]1573MiB/s (1650MB/s)[/B], 1573MiB/s-1573MiB/s (1650MB/s-1650MB/s), io=15.4GiB (16.5GB), run=10001-10001msec

please open a new thread for this and include more information about your setup there (e.g. disk model). This is a thread about kernel 6.5.

Outpost1534 · Nov 10, 2023

adamb said:
Bummer.

Tested 6.5 on one of my new SuperMicro front ends with 4x Intel Xeon Gold 6448H. VM locks up under load with CPU's stuck. I do run zfs on root with 2 Micron 5400 Pro's.

Server.
https://www.supermicro.com/en/products/system/mp/2u/sys-241e-tnrttp

VM storage is on HPE Alletra NVMe.

Back to 5.15.x and no issues.

I will be looking to test KSM and other performance issues on older hardware next.

That's disappointing. This has been an issue for a long time. If I was a paying subscriber, I'd be livid.

etnicor · Nov 11, 2023

6.5.3-1-pv prevent cpu to reach c6 package states. It's stuck at c3.

This bug/change is also in proxmox 8.0.5 kernel.

My i3-13100 cpu can reach c6 state upto proxmox version 8.0.4, but brakes in 8.0.5 with 6.2.16-19-pv and does not work with 6.5 kernel either.

Have done testing with fresh install and only upgraded kernel one version at the time.
Verified using latest powertop version compiled from source to verify cpu package C-state and s-tui to monitor cpu package power consumption.

t.lamprecht · Nov 11, 2023

etnicor said:
This bug/change is also in proxmox 8.0.5 kernel.

There's no such kernel version currently, do you mean the 6.2 one?
See the node's summary panel in the web-interface or use uname -a to get the currently booted kernel version. For all installed ones you can check pveversion -v.

etnicor · Nov 11, 2023

t.lamprecht said:
There's no such kernel version currently, do you mean the 6.2 one?
See the node's summary panel in the web-interface or use uname -a to get the currently booted kernel version. For all installed ones you can check pveversion -v.

I mean the package version, which I assume is proxmox release. So it brakes with kernel 6.2.16-19-pv in proxmox version 8.0.5 and it's present in 6.5.3-1-pv kernel also.

Code:

apt list -a  pve-kernel-6.2
Listing... Done
pve-kernel-6.2/stable,now 8.0.5 all [installed,automatic]
pve-kernel-6.2/stable 8.0.4 all
pve-kernel-6.2/stable 8.0.3 all
pve-kernel-6.2/stable 8.0.2 all
pve-kernel-6.2/stable 8.0.1 all
pve-kernel-6.2/stable 8.0.0 all

t.lamprecht · Nov 11, 2023

etnicor said:
I mean the package version, which I assume is proxmox release.

The Proxmox VE's main version is not tightly related to the kernel, you can boot any 6.2 and 6.5 and possibly also older kernels with pve-manager 8.0.5, so it's not telling us much w.r.t. kernel issues

etnicor said:
So it brakes with kernel 6.2.16-19-pv in proxmox version 8.0.5 and it's present in 6.5.3-1-pv kernel also.

Ok, do you also know the latest version (6.2?) where this still worked for you?

etnicor · Nov 11, 2023

t.lamprecht said:
The Proxmox VE's main version is not tightly related to the kernel, you can boot any 6.2 and 6.5 and possibly also older kernels with pve-manager 8.0.5, so it's not telling us much w.r.t. kernel issues

Ok, do you also know the latest version (6.2?) where this still worked for you?

Yes it worked in kernel 6.2.16-5-pve from package version 8.0.4

Used apt to upgrade kernel versions with version lock.

Code:

apt-get install --only-upgrade pve-kernel-6.2=8.0.4

etnicor · Nov 11, 2023

Tested through the 6.2 kernels and the change which "brakes" cpu package state was introduced in proxmox-kernel-6.2.16-16-pve

proxmox-kernel-6.2.16-6-pve works
proxmox-kernel-6.2.16-8-pve works
proxmox-kernel-6.2.16-9-pve works
proxmox-kernel-6.2.16-10-pve works
proxmox-kernel-6.2.16-11-pve works
proxmox-kernel-6.2.16-12-pve works
proxmox-kernel-6.2.16-13-pve works
proxmox-kernel-6.2.16-14-pve works
proxmox-kernel-6.2.16-15-pve works
proxmox-kernel-6.2.16-16-pve Fails to reach C6
proxmox-kernel-6.2.16-17-pve Fails to reach C6
proxmox-kernel-6.2.16-18-pve Fails to reach C6

Sorry for talking about 6.2 kernel in 6.5 thread but same problem in 6.5.

EDIT: update sources to Ubuntu-6.2.0-36.36 So would I assume right this is an ubuntu kernel problem... ?
Don't see any obvious in changelog. https://www.ubuntuupdates.org/package/canonical_kernel_team/lunar/main/base/linux

t.lamprecht · Nov 11, 2023

etnicor said:
proxmox-kernel-6.2.16-15-pve works
proxmox-kernel-6.2.16-16-pve Fails to reach C6

Ah great, that narrows it down quite a bit, still over 1600 commits between those two releases – can you give me some information about the HW in use here? Mainly CPU and motherboard model would be good to know here.

etnicor said:
Sorry for talking about 6.2 kernel in 6.5 thread but same problem in 6.5.

Yeah, let's move this to a new, separate thread. Could you please create one with above information and tag my username with @?

Shlee · Nov 12, 2023

SuperMicro H12DSI-N6. AMD EPYC 7642.

While everything seems to working correctly, caught a few messages in dmesg

kvm[7381]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set

This pops up at boottime once.
sounds like KVM needs to be updated? https://lwn.net/Articles/918106/

workqueue: drm_fb_helper_damage_work [drm_kms_helper] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND

This pops up multiple time during bootup. I've got a nvidia GPU setup for passthru.

t.lamprecht · Nov 12, 2023

Shlee said:
This pops up at boottime once.

We already added a stop-gap, reverting this warning to a single-time one with the next version of this kernel, just like usptream did too.
As there's not much to fix here, but to improve a few things that have to be done very carefully to not break existing use-cases and people migrating from older kernels.

Shlee said:
workqueue: drm_fb_helper_damage_work [drm_kms_helper] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND

Click to expand...

This pops up multiple time during bootup. I've got a nvidia GPU setup for passthru.

This is the work of a new detection mechanism for workloads that should be handled differently in the kernel, but cannot be easily determined upfront (or where just wrongly classified when added).
You could report this to the respective kernel devs, just like:
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg452840.html

We'll possibly disable this for the next release of v6.5, as it seems to confuse users more, and they often do not have the knowledge for reporting such things to the correct kernel maintainers, so possibly better suited for some testing kernels booted in our testlab.

Shlee · Nov 12, 2023

Excellent. In that case, considering these were warnings.. I've had no serious problems from 6.5 on my server/workloads so far.

etnicor · Nov 12, 2023

t.lamprecht said:
Ah great, that narrows it down quite a bit, still over 1600 commits between those two releases – can you give me some information about the HW in use here? Mainly CPU and motherboard model would be good to know here.

Yeah, let's move this to a new, separate thread. Could you please create one with above information and tag my username with @?

To finish this of for 6.5 aswell. The bug most likely was introduced in an ubuntu kernel patch for Realtek driver. Without looking I assume this patch is in the 6.5 kernel aswell. Atleast I do not have any issues when using my Intel i226-v nic, which became the solution.

I guess the saying don't use Realtek nic's still holds true.

t.lamprecht · Nov 12, 2023

etnicor said:
To finish this of for 6.5 aswell. The bug most likely was introduced in an ubuntu kernel patch for Realtek driver. Without looking I assume this patch is in the 6.5 kernel aswell.

I re-checked the patches and it seems this is not really an unexpected regression but rather a trade-off.
Most (all, but possibly hardware revision and/or firmware dependent) of those models have issues if Active-state power management (ASPM) is enabled, i.e., after a while they crash completely requiring a reboot to get network working again.
That is (understandably) considered quite a bit worse than not having power management, and on kernel side this the only thing that could be controlled, so it was disabled again.

Ideally the HW vendor would fix this for real, and deliver that, e.g., via a firmware update.
So we won't consider this a fixable bug for the time being, it's by design to make broken HW a bit less broken, or at least basically useable.

fiona · Nov 13, 2023

Instantus said:
Kernel 6.5.3-1-pve (like the latest 6.2) does not boot reliably from a Lexar NM790 4TB SSD. Sometimes the kernel boots, sometimes the boot process aborts with "nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0", dropping to BusyBox. A patch is available since 6.5.5 in the mainline kernel. If upstream does not upgrade or backport the patch in the near future, please consider adding it to the PVE build. Multiple users already encountered this issue. Responses in the thread that this build fixed the issue are not correct (tested on NUC7i5).

FYI, the current version of the package, i.e. proxmox-kernel-6.5.11-1-pve, includes a backport of that patch.

swo · Nov 13, 2023

I have a Dell R340 system that won't boot any of the testing 6.5 kernels. It previously booted fine with all of the 5.15 series and boots fine with 6.2.16-19-pve and previous 6.2 kernels. The install is a zfs root mirrored pair of sata drives on a LSI sas controller. It halts at "EFI Stub: Loaded initrd from command line option".

jasonsansone · Nov 13, 2023

swo said:
I have a Dell R340 system that won't boot any of the testing 6.5 kernels. It previously booted fine with all of the 5.15 series and boots fine with 6.2.16-19-pve and previous 6.2 kernels. The install is a zfs root mirrored pair of sata drives on a LSI sas controller. It halts at "EFI Stub: Loaded initrd from command line option".

I had similar issues. Eventually it boots past the EFI stub, but then scrolls endless mptsas3 errors, presumably related to the LSI SAS HBA. Rock solid on ZFS 2.1.13 and kernel 6.2. I rolled everything back and was fine.

Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Active Member

Famous Member

New Member

Proxmox Staff Member

New Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Well-Known Member

We value your privacy