Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

3. the CPUs must have the same number of physical cores to work together.
4. the CPUs must have the same number of logical cores to work together.

oh I think I understand what you mean. I misunderstood your assertion initially.
We have been doing this for a decade with no issues, its not worth the debate.
Curious; to what end (whats the use case?) such a configuration would necessarily be SLOWER then a 1:1 virtual:physical mapping, itself slower then multiple of submappings because of host contention.
 
Tried on my homelab (Zen 2), no problem, cpu usage is lower but consumption remains (my hopes where here) the same.
 
Last edited:
oh and I just noticed this. if you're overprovisioning AND not using numa mapping, the guest would have no idea how to pin memory properly. this would have pretty dire performance consequences. I'm really curious what the use case is now...
The use cases are large VM's, from LAMP to very proprietary databases.

Numa on or off doesn't matter, it just happened to be the option that was left after testing. We typically have numa on, but we wanted to try every option.

Like I said, this is a Intel XEON 4th Gen CPU issue.

I can't stress it enough, we have no issues on 2nd and 3rd gen quad socket setups.

Anyone in here actually running new gen hardware?
 
Last edited:
Anyone in here actually running new gen hardware?

112 x Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz (2 Sockets)

These are dual socket Dell R750 systems, about a year old. Currently running PVE 8, fully updated to the pve-no-subscription repo as of today. I haven't seen any issues on the VMs or the hypervisors yet. 512GB RAM per host, three physical hosts in the cluster currently.
 
112 x Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz (2 Sockets)

These are dual socket Dell R750 systems, about a year old. Currently running PVE 8, fully updated to the pve-no-subscription repo as of today. I haven't seen any issues on the VMs or the hypervisors yet. 512GB RAM per host, three physical hosts in the cluster currently.

Appreciate the input, but thats not new hardware. That is 3rd gen intel.

We have alot of 3rd gen's with 0 issues.
 
Appreciate the input, but thats not new hardware. That is 3rd gen intel.

We have alot of 3rd gen's with 0 issues.

Ah, how did I miss the 4th gen bit. I had assumed these would be fairly up to date given their age. Looks like we ordered these right before the first 4th gen Xeons were released.

Looking back through the thread, my systems seem to be in a fairly parallel environment to yours. We have fiber-channel storage to a Dell PowerStore 3200T NVMe SAN, etc.

I'll keep watching this thread to see how the 4th gen issues turn out. Good luck!
 
Anyone in here actually running new gen hardware?
We have a three node dual Intel(R) Xeon(R) Gold 6426Y cluster here since about a month, running various tests and preparing for our next benchmark paper, most of the latter was done on our 6.5 based kernel with great success for now. The same HW was also tested by our solution partner Thomas Krenn before being delivered to us, and they're selling those systems with Proxmox VE pre-installed and did not find any issue either (albeit, back then their test probably used 6.2).

FWIW, we also had access to 4th gen scalable before its official release, from a HW vendor's early access programs, but that was about a year ago and back then we tested with our (then quite fresh) 6.2 kernel, which worked fine there too.
But yes, we did not run tests with some proprietary DBs, maybe you got some more specific details about a use case that can be reproduced without some opaque software, then we could try to look if we can reproduce any of that. As it sounds like you do lots of tinkering, there might also some guest OS tunables involved, iow., the more details we got the more likely we can find something, if it's an underlying issue and not some misconfiguration.
 
Last edited:
  • Like
Reactions: pschneider1968
I can't stress it enough, we have no issues on 2nd and 3rd gen quad socket setups.
The only "issues" you'd run into is pulling memory over qpi, which wouldnt result in errors or any behavior other then being slow. If leaving numa off yields no appreciable difference, I suggest your vm's arent actually doing anything with the resources provisioned.
 
Anyone in here actually running new gen hardware?
I have a few customers with Intel servers and 4th Gen CPUs, but none of them have such a large VM.
NUMA=1 is mandatory in the setup, if you don't notice any difference in performance, then the VM is already running suboptimally anyway.
I have similarly large VMs on vSphere at customers, where NUMA is always clearly noticeable and the VMs are never allocated more cores than the host has real cores.
 
Kernel 6.5.3-1-pve (like the latest 6.2) does not boot reliably from a Lexar NM790 4TB SSD. Sometimes the kernel boots, sometimes the boot process aborts with "nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0", dropping to BusyBox. A patch is available since 6.5.5 in the mainline kernel. If upstream does not upgrade or backport the patch in the near future, please consider adding it to the PVE build. Multiple users already encountered this issue. Responses in the thread that this build fixed the issue are not correct (tested on NUC7i5).
 
Unable to build Google Coral Gasket driver:

DKMS make.log for gasket-1.0 for kernel 6.5.3-1-pve (x86_64)
Fri Nov 3 06:21:01 PM PDT 2023
make: Entering directory '/usr/src/linux-headers-6.5.3-1-pve'
CC [M] /var/lib/dkms/gasket/1.0/build/gasket_core.o
CC [M] /var/lib/dkms/gasket/1.0/build/gasket_ioctl.o
CC [M] /var/lib/dkms/gasket/1.0/build/gasket_interrupt.o
CC [M] /var/lib/dkms/gasket/1.0/build/gasket_page_table.o
CC [M] /var/lib/dkms/gasket/1.0/build/gasket_sysfs.o
CC [M] /var/lib/dkms/gasket/1.0/build/apex_driver.o
/var/lib/dkms/gasket/1.0/build/gasket_core.c: In function ‘gasket_register_device’:
/var/lib/dkms/gasket/1.0/build/gasket_core.c:1841:41: error: passing argument 1 of ‘class_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
1841 | class_create(driver_desc->module, driver_desc->name);
| ~~~~~~~~~~~^~~~~~~~
| |
| struct module *
In file included from ./include/linux/device.h:31,
from ./include/linux/cdev.h:8,
from /var/lib/dkms/gasket/1.0/build/gasket_core.h:11,
from /var/lib/dkms/gasket/1.0/build/gasket_core.c:12:
./include/linux/device/class.h:230:54: note: expected ‘const char *’ but argument is of type ‘struct module *’
230 | struct class * __must_check class_create(const char *name);
| ~~~~~~~~~~~~^~~~
/var/lib/dkms/gasket/1.0/build/gasket_core.c:1841:17: error: too many arguments to function ‘class_create’
1841 | class_create(driver_desc->module, driver_desc->name);
| ^~~~~~~~~~~~
./include/linux/device/class.h:230:29: note: declared here
230 | struct class * __must_check class_create(const char *name);
| ^~~~~~~~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:251: /var/lib/dkms/gasket/1.0/build/gasket_core.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/usr/src/linux-headers-6.5.3-1-pve/Makefile:2037: /var/lib/dkms/gasket/1.0/build] Error 2
make: *** [Makefile:234: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.5.3-1-pve'
 
Kernel 6.5.3-1-pve (like the latest 6.2) does not boot reliably from a Lexar NM790 4TB SSD. Sometimes the kernel boots, sometimes the boot process aborts with "nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0", dropping to BusyBox. A patch is available since 6.5.5 in the mainline kernel. If upstream does not upgrade or backport the patch in the near future, please consider adding it to the PVE build. Multiple users already encountered this issue. Responses in the thread that this build fixed the issue are not correct (tested on NUC7i5).
Yes, still has problem here too. Now I use 6.1.10-1-pve kernel and it seems work well.
 

Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test​


Two questions:
  1. Does this mean that ZFS support will be built in to the Kernel going forward? (as I believe it is with Ubuntu's Kernels)? And if so is this laying the groundwork for Secure Boot support or at least better compatibility?
  2. Once this Kernel makes its way out of the testing repositories and into the no-subscription repo, what is the process for switching back to the no-sub repos?
 
Does this mean that ZFS support will be built in to the Kernel going forward? (as I believe it is with Ubuntu's Kernels)?
That's the case since many year (2014 IIRC).

And if so is this laying the groundwork for Secure Boot support or at least better compatibility?
That's rather unrelated from how ZFS is shipped, boot there is always done via an ESP (EFI System Partition), wich is using FAT in any way.
But sure, once secure boot of Proxmox VE is available it will also work with ZFS.
Once this Kernel makes its way out of the testing repositories and into the no-subscription repo, what is the process for switching back to the no-sub repos?
Simply disable the test and enable the no-subscription one again, the latter imports directly from the former.
 
  • Like
Reactions: erikr_c
If we deploy new kernel, it's possible to rollback to previous version (6.2) if we have issues ?
 
If we deploy new kernel, it's possible to rollback to previous version (6.2) if we have issues ?
In general yes. You'd only need to take care when you're using ZFS to not upgrade any pool to the newer ZFS features, as then the older ZFS from older Kernels won't understand those pools anymore
 
In general yes. You'd only need to take care when you're using ZFS to not upgrade any pool to the newer ZFS features, as then the older ZFS from older Kernels won't understand those pools anymore
Ok, there is a prompt when i start update ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!