Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Nollimox · Nov 6, 2023

Install here didn't go smoothly getting: Failed to import pool "rpool."
Booting up stuck...how to resolve?

Nollimox · Nov 6, 2023

t.lamprecht said:
In general yes. You'd only need to take care when you're using ZFS to not upgrade any pool to the newer ZFS features, as then the older ZFS from older Kernels won't understand those pools anymore

Could be the cause of my failed to import pool "rpool." my system is not repairable?

Nollimox · Nov 6, 2023

Nollimox said:
Install here didn't go smoothly getting: Failed to import pool "rpool."
Booting up stuck...how to resolve?

Nollimox said:
Could be the cause of my failed to import pool "rpool." my system is not repairable?

Solved...after I did the install and rebooting, I entered the bios and turned on three drives...that confused the new kernel first time booting. All is well.

Nine hours late...

Screen Shot 2023-11-06 at 6.45.48 AM.png

chrolz · Nov 6, 2023

On my current lab setup, everything is still working.
I only noticed that my overall idle wattages raised from about 35W to about 50W.

And if I use the command
turbostat --quiet --interval 5 --cpu 8,10 --show "PkgWatt","Busy%","Core","CoreTmp"
for showing my CPU wattage consumption, I can see that on idle I came from about 3.5W on the 6.2 kernel and now I am at around 6W on idle on 6.5 kernel.

That is not a big difference, but the whole system is intended to run 24/7 and in this case concerning idle it is a difference.

CPU: Intel i9-13900T
Mainboard: ASUS ProArt Z790-Creator WIFI
RAM: Crucial Pro DIMM Kit 2x48GB
GPU: NVIDIA GTX 1050 Ti

~~I'm not sure, if there is a reason for that, or if this is maybe only temporary.~~

Update: It was a mistake, BIOS settings changed after a firmware updated, there is no significant difference in power consumption.

adamb · Nov 6, 2023

t.lamprecht said:
We have a three node dual Intel(R) Xeon(R) Gold 6426Y cluster here since about a month, running various tests and preparing for our next benchmark paper, most of the latter was done on our 6.5 based kernel with great success for now. The same HW was also tested by our solution partner Thomas Krenn before being delivered to us, and they're selling those systems with Proxmox VE pre-installed and did not find any issue either (albeit, back then their test probably used 6.2).

FWIW, we also had access to 4th gen scalable before its official release, from a HW vendor's early access programs, but that was about a year ago and back then we tested with our (then quite fresh) 6.2 kernel, which worked fine there too.
But yes, we did not run tests with some proprietary DBs, maybe you got some more specific details about a use case that can be reproduced without some opaque software, then we could try to look if we can reproduce any of that. As it sounds like you do lots of tinkering, there might also some guest OS tunables involved, iow., the more details we got the more likely we can find something, if it's an underlying issue and not some misconfiguration.

Running on the 4th Gen Intel with any of the 6.5.x kernels results in the following within a matter of an hour or so.

I can drop core counts, memory, enable numa etc, none of it matters and we still hit CPU lockups.

Move the VM back to a 5.15.x kernel and the VM is rock solid. Move the VM to a 2nd or 3rd gen Intel with a 6.x.x kernel and its solid.

Also worth mentioning that the VM throws NMI right on boot with any 6.x.x kernel on the 4th gen intel.

fiona · Nov 6, 2023

adamb said:
Running on the 4th Gen Intel with any of the 6.5.x kernels results in the following within a matter of an hour or so.

View attachment 57625

I can drop core counts, memory, enable numa etc, none of it matters and we still hit CPU lockups.

Move the VM back to a 5.15.x kernel and the VM is rock solid. Move the VM to a 2nd or 3rd gen Intel with a 6.x.x kernel and its solid.

Also worth mentioning that the VM throws NMI right on boot with any 6.x.x kernel on the 4th gen intel.

View attachment 57626

What kernel is running within the VM?

adamb · Nov 6, 2023

fiona said:
What kernel is running within the VM?

Its a Debian 12 VM.

root@progmaindeb:~# uname -r
6.1.0-13-amd64

fiona · Nov 6, 2023

adamb said:
Running on the 4th Gen Intel with any of the 6.5.x kernels results in the following within a matter of an hour or so.

I haven't been able to reproduce the issue locally here even after a few hours, matching your VM configuration as close as I could (it's only dual socket on my end) and using stress-ng to generate a lot of CPU load and some memory and IO load within the VM.

Could you describe the workload inside the VM in a bit more detail? Is there anything else running on the host system around the time the issue occurs?

adamb said:
I can drop core counts, memory, enable numa etc, none of it matters and we still hit CPU lockups.

How much did you drop the core count? I noticed the configuration has CPU hotplug enabled. Just asking to be sure: was it used during the test?

It's just a shot in the dark, but you could try turning off the numa_balancer as suggested here for a different issue that's also happening with 6.x kernels and not 5.15: https://forum.proxmox.com/threads/p...th-windows-server-2019-vms.130727/post-601617

adamb · Nov 6, 2023

fiona said:
I haven't been able to reproduce the issue locally here even after a few hours, matching your VM configuration as close as I could (it's only dual socket on my end) and using stress-ng to generate a lot of CPU load and some memory and IO load within the VM.

Could you describe the workload inside the VM in a bit more detail? Is there anything else running on the host system around the time the issue occurs?

How much did you drop the core count? I noticed the configuration has CPU hotplug enabled. Just asking to be sure: was it used during the test?

It's just a shot in the dark, but you could try turning off the numa_balancer as suggested here for a different issue that's also happening with 6.x kernels and not 5.15: https://forum.proxmox.com/threads/p...th-windows-server-2019-vms.130727/post-601617

Heavy compile load. Cc1plus is the vast majority of it.

I will do some testing shortly and report back.

Nollimox · Nov 7, 2023

Why am I getting these on kernel 6.5?

Nov 06 18:42:55 nolliprivatecloud login[1948]: ROOT LOGIN on '/dev/pts/0'
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 00000000dc7b1645, i_count = 1, was skipped!
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 0000000074b7fd0c, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000165fefb7, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000f7a2b3b6, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000e22e196e, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000997c6818, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000006e9fe81d, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000005281d993, i_count = 1, was skipped!
Nov 06 18:51:28 nolliprivatecloud pvedaemon[1447]: <root@pam> successful auth for user 'root@pam'
And:
Nov 06 18:39:27 nolliprivatecloud smartd[974]: Device: /dev/nvme1, number of Error Log entries increased from 110 to 124

swo · Nov 7, 2023

Nollimox said:
Why am I getting these on kernel 6.5?

Nov 06 18:42:55 nolliprivatecloud login[1948]: ROOT LOGIN on '/dev/pts/0'
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 00000000dc7b1645, i_count = 1, was skipped!
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 0000000074b7fd0c, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000165fefb7, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000f7a2b3b6, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000e22e196e, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000997c6818, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000006e9fe81d, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000005281d993, i_count = 1, was skipped!
Nov 06 18:51:28 nolliprivatecloud pvedaemon[1447]: <root@pam> successful auth for user 'root@pam'
And:
Nov 06 18:39:27 nolliprivatecloud smartd[974]: Device: /dev/nvme1, number of Error Log entries increased from 110 to 124

Out of curiosity are you using a nvme special vdev ? I have seen a few of these on my data pool that uses a mirrored pair of nvme for the special device. Happened with an raw send and receive of an encrypted dataset

fiona · Nov 7, 2023

adamb said:
Heavy compile load. Cc1plus is the vast majority of it.

I will do some testing shortly and report back.

While I didn't test with that this time, I did kernel compilations when the issues where first reported here for 6.2. But I couldn't trigger any soft lockups either. Do have latest BIOS update and microcode installed?

EDIT: Questions still left from the last post:
How much did you drop the core count? I noticed the configuration has CPU hotplug enabled. Just asking to be sure: was it used during the test?

t.lamprecht · Nov 7, 2023

Nollimox said:
Why am I getting these on kernel 6.5?

Nov 06 18:42:55 nolliprivatecloud login[1948]: ROOT LOGIN on '/dev/pts/0'
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 00000000dc7b1645, i_count = 1, was skipped!
Nov 06 18:44:47 nolliprivatecloud kernel: evict_inodes inode 0000000074b7fd0c, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000165fefb7, i_count = 1, was skipped!
Nov 06 18:45:03 nolliprivatecloud kernel: evict_inodes inode 00000000f7a2b3b6, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000e22e196e, i_count = 1, was skipped!
Nov 06 18:45:18 nolliprivatecloud kernel: evict_inodes inode 00000000997c6818, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000006e9fe81d, i_count = 1, was skipped!
Nov 06 18:45:33 nolliprivatecloud kernel: evict_inodes inode 000000005281d993, i_count = 1, was skipped!
Nov 06 18:51:28 nolliprivatecloud pvedaemon[1447]: <root@pam> successful auth for user 'root@pam'
And:
Nov 06 18:39:27 nolliprivatecloud smartd[974]: Device: /dev/nvme1, number of Error Log entries increased from 110 to 124

Thanks for the report, this is due to an ubuntu specific backport, there was already a report at their bug tracker, but I posted the info there about which patch causes this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2037214

We'll revert that patch for the next build, as it was a bogus backport.

adamb · Nov 7, 2023

fiona said:
While I didn't test with that this time, I did kernel compilations when the issues where first reported here for 6.2. But I couldn't trigger any soft lockups either. Do have latest BIOS update and microcode installed?

EDIT: Questions still left from the last post:
How much did you drop the core count? I noticed the configuration has CPU hotplug enabled. Just asking to be sure: was it used during the test?

Good questions.

Just checked Supermicro's site and the host is on the latest Bios. They don't seem to offer any microcode updates as of yet for this model.

Doing some testing without CPU hotplug enabled and I found the following.

CPU Hotplug Enabled
- VM Boots aok with all the cores from the host

CPU Hotplug Disabled
- VM freezes with core counts past 192
- Makes it through 50% of its boot process then locks up

Also any workaround for the bpfilter issues in the logs?

[Tue Nov 7 06:02:14 2023] bpfilter: Loaded bpfilter_umh pid 174477
[Tue Nov 7 06:02:14 2023] bpfilter: write fail -32
[Tue Nov 7 06:02:14 2023] bpfilter: Loaded bpfilter_umh pid 174486
[Tue Nov 7 06:02:14 2023] bpfilter: write fail -32
[Tue Nov 7 06:02:14 2023] bpfilter: Loaded bpfilter_umh pid 174488
[Tue Nov 7 06:02:14 2023] bpfilter: write fail -32
[Tue Nov 7 06:02:15 2023] bpfilter: Loaded bpfilter_umh pid 174492
[Tue Nov 7 06:02:15 2023] bpfilter: write fail -32
[Tue Nov 7 06:02:15 2023] bpfilter: Loaded bpfilter_umh pid 174494
[Tue Nov 7 06:02:15 2023] bpfilter: write fail -32

Multiple entries every second. So far the bpfilter log messages have been a issue on every front end I tested 6.5.x

fiona · Nov 7, 2023

adamb said:
Just checked Supermicro's site and the host is on the latest Bios. They don't seem to offer any microcode updates as of yet for this model.

What about the intel-microcode package?

adamb said:
Doing some testing without CPU hotplug enabled and I found the following.

CPU Hotplug Enabled
- VM Boots aok with all the cores from the host

CPU Hotplug Disabled
- VM freezes with core counts past 192
- Makes it through 50% of its boot process then locks up

Did you run a benchmark (on a working kernel) to compare how much you would actually lose from limiting the core count to the amount of actual CPUs (and not just hyperthreads)? I know you said you didn't have issues with it in the last decade, but if we can't reproduce the issue, we'll have a really hard time tracking it down, so having a workaround would be at least something.

EDIT: Or even better, compare performance of 5.15 kernel with high CPU count to 6.5 kernel with reduced CPU count.

adamb · Nov 7, 2023

fiona said:
What about the intel-microcode package?

Did you run a benchmark (on a working kernel) to compare how much you would actually lose from limiting the core count to the amount of actual CPUs (and not just hyperthreads)? I know you said you didn't have issues with it in the last decade, but if we can't reproduce the issue, we'll have a really hard time tracking it down, so having a workaround would be at least something.

EDIT: Or even better, compare performance of 5.15 kernel with high CPU count to 6.5 kernel with reduced CPU count.

We hit soft lockups again with the VM set to 192 cores and our compiles set to use 128 of those cores (NUMA on and CPU hotplug disabled).

No benchmark's, these are production, we don't have time to do that kind of stuff. Hence the reason we pay for enterprise repo's.

Just updated the microcode.

root@ccsprogmiscrit1:~# journalctl -k --grep="microcode"
-- Journal begins at Tue 2023-07-25 08:42:43 EDT, ends at Tue 2023-11-07 13:18:20 EST. --
Nov 07 13:17:33 ccsprogmiscrit1 kernel: microcode: updated early: 0x2b0001b0 -> 0x2b0004b1, date = 2023-05-09
Nov 07 13:17:33 ccsprogmiscrit1 kernel: microcode: Microcode Update Driver: v2.2.

Re-running our compile now, last attempt resulted in cpu lockups. Maybe the microcode update will help.

adamb · Nov 8, 2023

adamb said:
We hit soft lockups again with the VM set to 192 cores and our compiles set to use 128 of those cores (NUMA on and CPU hotplug disabled).

No benchmark's, these are production, we don't have time to do that kind of stuff. Hence the reason we pay for enterprise repo's.

Just updated the microcode.

root@ccsprogmiscrit1:~# journalctl -k --grep="microcode"
-- Journal begins at Tue 2023-07-25 08:42:43 EDT, ends at Tue 2023-11-07 13:18:20 EST. --
Nov 07 13:17:33 ccsprogmiscrit1 kernel: microcode: updated early: 0x2b0001b0 -> 0x2b0004b1, date = 2023-05-09
Nov 07 13:17:33 ccsprogmiscrit1 kernel: microcode: Microcode Update Driver: v2.2.

Re-running our compile now, last attempt resulted in cpu lockups. Maybe the microcode update will help.

Unfortunately still getting soft lockups even with the latest microcode. Doesn't seem to matter what way I have the VM configured.

adamb · Nov 8, 2023

Here is some more interesting stuff. This has been a issue since the 6.x kernel hit the streets for proxmox.

Host: HP DL 560 Gen10
Storage: iSCSI Alletra 6000 NVMe

I have 2 very active CentOS7 VM's running httpd/java/tomcat, typically seeing 2k-3k httpd sessions at any given time.

As activity ramps up the VM really struggles to handle the load when the host is on any 6.x.x kernel.

On 5.15.x the VM's are rock solid, not a hiccup.

What jumps out at me is how much IO wait within the VM is different when running the different host kernels.

Check out this graph ( 7 Day IO wait graph for one of the two C7 VM's)
- Im sure you can guess based on the graph below when the host went to the 6.5.x kernel

I've tried just about every setting under the sun for these VM's
- Various core counts, socket counts etc
- Various memory amounts
- NUMA on
- NUMA off
- io_uring, threads, native
- iothreads=1
- Disabling numa_balancing

Only thing I haven't tried was disabling ksm, but tbf the host 1.5TB of ram and with 3 VM's running, they only use about 500G, so I don't think ksm never does much.

We are in the process of moving to Debian12 for production servers, but im not sold changing the guest os is going to help.

Ramalama · Nov 8, 2023

@t.lamprecht

1. Updated now 5 Servers, all with very different Hardware (2x Silver 4210R / 1x Ryzen 5800X / 1x Nuc13 i3-1315U / 1x Nuc11 / 1x Xeon-E5 v3)
--> So very mixed old/new Hardware and everything works perfect!
--> ZFS Upgrade went perfect either on all Servers

2. Arc A380 DGPU
--> Works very reliable with 6.5 Kernel and absolutely perfect, no issues anymore at all.
--> On 6.2 Kernels, the GPU worked for transcoding, but crashed randomly especially with jellyfin, plex rarely either.

3. Docker Container on ZFS (LXC)
--> YESSSSS, finally every docker image works without issues!
--> On zfs 2.1, 80% of docker images worked, but some like speedtest-tracker don't.

4. Mellanox ConnectX-4 Lx
--> No change, vlans on the bridge aren't working until you set the Adapter into Promiscous mode.
--> Known Bug, newest FW, there are more topics about this. I had hope that 6.5 fixes this either, but no

Otherwise 6.5 works perfect, no issues & great work like always

Thank you

adamb · Nov 9, 2023

Another update. Hit lockups last night again, except this time the host had some interesting lines in the logs. These 3 lines took place right before the VM hit all kinds of kernel panics.

[Thu Nov 9 02:47:05 2023] workqueue: blk_mq_run_work_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[Thu Nov 9 02:48:19 2023] workqueue: blk_mq_run_work_fn hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[Thu Nov 9 02:49:53 2023] workqueue: blk_mq_run_work_fn hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND

Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Member

Member

Member

New Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Member

Member

Proxmox Staff Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Famous Member

Renowned Member

Famous Member

We value your privacy