Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

Ramalama · Apr 12, 2024

Stoiko Ivanov said:
yes - exactly! - this is why it's really odd that the service vanished - I'd suggest to check your journal (a few boots before the 6.8 kernel) - to see if the service used to exist - and maybe also check /var/log/apt/term.log if there's any indication that the service got removed during upgrade (it should not)

but glad your system is running fine!

Thanks for the hints, i checked last 10 boots.
The zfs-import@HDD-SATA service was missing already in all 10 last boots, so it was never there actually.
So i would say that i something weird happened or i deleted that pool and recreated it, but even then the service wont get deleted normally on zpool destroy.

Doesn't matter, the service is my fault.
But what i don't get is, the pool was definitively imported still automatically somehow, because i would have seen it instantly, if not. The question is then, why it wasn't get imported anymore then after the update.

However, it doesn't matter anyway, because its definitively something weird i did, so i have to find out myself.
Thanks for the hints @Stoiko Ivanov
Cheers

mac.linux.free · Apr 12, 2024

Stoiko Ivanov said:
I managed to reproduce the UBSAN warning with a machine with broadcom NICs here - but did not get the firmware hangs:

(but that might be due to different firmwares, or because I did not send loads of traffic over the interfaces)

To get the UBSAN warnings (undefined behavior sanitzier - to my knowledge these are warnings printed by the kernel to notify driver-maintainers about potentially problematic uses in their code, but they do not by themselves cause problems) - I needed to enable rdma/ib for the NIC using broadcom's niccli [0] utility (you can download it from the broadcom website, and need to install the utility and the dkms package for the `sliff` driver).

Sadly I don't have too much experience with Broadcom infiniband NICs, and whether changing their settings (or loading the third-party sliff module) while running can cause problems - so please be careful, and don't do this in production!

On a hunch - maybe the BCM57416 in general, or only as on-board NICs on the supermicro board have the RDMA setting enabled by default, while most other broadcom NICs do not (we would have a larger number of reports with problems if many broadcom nics are affected).

You could try disabling the rdma-support on the NICs:

Code:

niccli -i 1 nvm -setoption support_rdma -scope 0 -value 0 niccli -i 1 reset

for the interface index and scope setting - and a general info - please consult the broadcom documentation - I also found the following article from the thomas-krenn wiki helpful (in German):
https://www.thomas-krenn.com/de/wiki/Broadcom_NICCLI_Configuration_Utility

Alternatively you could also try to unload/blocklist the bnxt_re module from getting loaded (afaict it is the module that provides RDMA/IB functionality for broadcom NICs)

If this does not help - I'd suggest to open a new thread here (feel free to mention me @Stoiko Ivanov, so I do not overlook it), to keep the general thread for the 6.8 kernel less noisy.

[0] https://techdocs.broadcom.com/us/en...-software-installation-and-configuration.html

thanks blocklist module helped.

jaminmc · Apr 13, 2024

gfngfn256 said:
Have you tried NTFS3 which is a kernel implementation (developed by Paragon) & much newer as opposed to the older NTFS-3G which is a FUSE layer.

I know that NTFS3 had its teething problems when introduced - but I imagine it must have matured by now, although it may still have issues, IDK.

Disclosure: I don't personally use NTFS at all for interoperability between Windows & Linux, I use exFAT for that exclusively & have had zero issues.

Ok, I did switch it over to NTFS3. That helped a little. But it still is super slow on the newer kernels.

After doing some google-fu I found out that pi4's were having trouble with the same USB chipset. https://jamesachambers.com/fixing-storage-adapters-for-raspberry-pi-via-firmware-updates/

The fix for me was to put "options usb-storage quirks=174c:55aa:u" in /etc/modprobe.d/usbstoragequirks.conf. Then do a "update-initramfs -ck all" to make make it take effect.

thiagotgc · Apr 13, 2024

neverstopdreaming said:
Something need to be fixed on net-snmp.
Syslog is showing:

systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224

I found some reference here:
https://github.com/net-snmp/net-snmp/pull/785

Did you find any solution?

t.lamprecht · Apr 15, 2024

thiagotgc said:
But how can 6.8.4-2 be in 6.8.1.1?
This is the only thing that appeared here...

Back then 6.8.1 was available on both test and no-subscription repo, while 6.8.4-2 was only available on the test repo.
Like all package updates the flow is [internal repo] -> [test] -> [no-subscription] -> [enterprise].

It seems like King Tiger had the test repo enabled and thus got already the update to 6.8.4-2, while you do not have the test repo enabled and so did not yet see that update.

Anyhow, now the slightly newer 6.8.4-2 version is also available on the no-subscription repos.

Stoiko Ivanov · Apr 15, 2024

Ramalama said:
But what i don't get is, the pool was definitively imported still automatically somehow, because i would have seen it instantly, if not. The question is then, why it wasn't get imported anymore then after the update.

There are a few things that automatically import zpools in PVE (as long as the pool is also defined in storage.cfg):
* during boot pvestatd runs activate_storage on all storages, for ZFS pools this runs `zpool import`
* starting a guest with a disk on a pool will also call activate_storage.

do you see any errors in the journal that might indicate why it's not imported?

Ramalama · Apr 15, 2024

Stoiko Ivanov said:
There are a few things that automatically import zpools in PVE (as long as the pool is also defined in storage.cfg):
* during boot pvestatd runs activate_storage on all storages, for ZFS pools this runs `zpool import`
* starting a guest with a disk on a pool will also call activate_storage.

do you see any errors in the journal that might indicate why it's not imported?

Not really, everything looks normal, i compared boot logs, everything looks normal. Even on the boot log, where the pool didn't get imported.
However, it must be some stupid nieche issue, and no one else has that, so i wouldn't spend my time with.
Thanks Stoiko for the Help!

neverstopdreaming · Apr 15, 2024

thiagotgc said:
Did you find any solution?

nope

hawko2600 · Apr 16, 2024

t.lamprecht said:
We recently uploaded a 6.8 kernel into our repositories, it will be used as new default kernel in the next Proxmox VE 8.2 point release (Q2'2024).
This follows our tradition of upgrading the Proxmox VE kernel to match the current Ubuntu version until we reach an (Ubuntu) LTS release. This kernel is based on the upcoming Ubuntu 24.04 Noble release.

We have run this kernel on some parts of our test setups over the last few days without any notable issues.

How to install:

Ensure that either the pve-no-subscription or pvetest repository is set up correctly.
You can do so via CLI text-editor or using the web UI under Node -> Repositories.

Open a shell as root, e.g. through SSH or using the integrated shell on the web UI.

apt update

apt install proxmox-kernel-6.8

reboot

Future updates to the 6.8 kernel will now be installed automatically when upgrading a node.

Please note:

The current 6.5 kernel is still supported and will still receive updates until the 6.8 becomes the new default.

There were many changes, for improved hardware support and performance improvements all over the place.
Examples include adding the EEVDF (Earliest Eligible Virtual Deadline First) task scheduler, improving latencies, the new shadow stacks to prevent exploits, and a new advisor for automated tuning of the KSM (Kernel Same-page Merging) subsystem. For a more complete list of changes we recommend checking out the kernel-newbies site for 6.6, 6.7, and the LWN's 6.8 merge window part 1 and part 2.

The kernel is also available on the test and no-subscription repositories of Proxmox Backup Server and Proxmox Mail Gateway.

If you're unsure, we recommend continuing to use the 6.5-based kernel for now.

Feedback about how the new kernel performs in any of your setups is welcome!
Please provide basic details like CPU model, storage types used, ZFS as root file system, and the like, for both positive feedback or if you ran into some issues, where the 6.8 kernel seems to be the likely cause.

With the current (525) debian nvidia driver packages installed, the dkms module won't compile, leading to the proxmox-kernel-6.8 package failing to install as well.

# cmd_gen_symversions_c /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm/uvm_page_tree_test.o
if nm /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm/uvm_page_tree_test.o 2>/dev/null | grep -q ' __export_symbol_'; then gcc -E -D__GENKSYMS__ -Wp,-MMD,/var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm/.uvm_page_tree_test.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -I./ubuntu/include -include ./include/linux/compiler_types.h -D__KERNEL__ -fmacro-prefix-map=./= -std=gnu11 -fshort-wchar -funsigned-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -mfunction-return=thunk-extern -fno-jump-tables -mharden-sls=all -fpatchable-function-entry=16,16 -fno-delete-null-pointer-checks -O2 -fno-allow-store-data-races -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -ftrivial-auto-var-init=zero -fno-stack-clash-protection -fzero-call-used-regs=used-gpr -pg -mrecord-mcount -mfentry -DCC_USING_FENTRY -falign-functions=16 -fno-strict-overflow -fno-stack-check -fconserve-stack -Wall -Wundef -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Werror=strict-prototypes -Wno-format-security -Wno-trigraphs -Wno-frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissing-prototypes -Wframe-larger-than=1024 -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer -Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-overflow -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wenum-conversion -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-restrict -Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-stringop-truncation -Wno-missing-field-initializers -Wno-type-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-compare -g -gdwarf-5 -I/var/lib/dkms/nvidia-current/525.147.05/build/common/inc -I/var/lib/dkms/nvidia-current/525.147.05/build -Wall -MD -Wno-cast-qual -Wno-error -Wno-format-extra-args -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"525.147.05\" -Wno-unused-function -Wuninitialized -fno-strict-aliasing -ffreestanding -mno-red-zone -mcmodel=kernel -DNV_UVM_ENABLE -Werror=undef -DNV_SPECTRE_V2=0 -DNV_KERNEL_INTERFACE_LAYER -O2 -DNVIDIA_UVM_ENABLED -DNVIDIA_UNDEF_LEGACY_BIT_MACROS -DLinux -D__linux__ -I/var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm -fsanitize=shift -fsanitize=bool -fsanitize=enum -DMODULE -DKBUILD_BASENAME='"uvm_page_tree_test"' -DKBUILD_MODNAME='"nvidia_uvm"' -D__KBUILD_MODNAME=kmod_nvidia_uvm /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm/uvm_page_tree_test.c | scripts/genksyms/genksyms -r /dev/null >> /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm/.uvm_page_tree_test.o.cmd; fi
make[2]: *** [/usr/src/linux-headers-6.8.4-2-pve/Makefile:1926: /var/lib/dkms/nvidia-current/525.147.05/build] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.8.4-2-pve'
make: *** [Makefile:82: modules] Error 2

Editing to note that current upstream driver 550.67 works fine. You have to:
Pull down the NVIDIA driver from download.nvidia.com
# telinit 3
# dpkg -l | awk '/nvidia/ {print $2} | xargs apt purge -y # remove all the nvidia packages
# ./NVIDIA-Linux-x86_64-550.67.run
# systemctl reboot

t.lamprecht · Apr 16, 2024

hawko2600 said:
With the current (525) debian nvidia driver packages installed, the dkms module won't compile, leading to the proxmox-kernel-6.8 package failing to install as well.

What's the exact package version as outputted by, e.g.: apt show nvidia-driver (Version field)?

As the changelog of the slightly newer 525.147.05-7~deb12u1 from the bookworm-updates repo mentions fixing build with the 6.8 kernel that would not be included in the version 525.147.05-4~deb12u1 (-7 vs -4) from the base repo yet.

If you still got the older version, you could try adding the updates' repo:
deb http://deb.debian.org/debian bookworm-updates main contrib non-free non-free-firmware
Refresh through apt update and then try installing the newer one.

nmorgowicz · Apr 17, 2024

Does anyone know how to fix HDR Tonemapping support on the 6.8 kernel to pass through to ubuntu LXC for plex?

On 6.5.13-5-pve my /dev/dri looks like this:

[root@nuc13 ~]$ ls -l /dev/dri
total 0
drwxr-xr-x 2 root root 80 Apr 15 18:15 by-path
crw-rw---- 1 root video 226, 0 Apr 15 18:15 card0
crw-rw---- 1 root render 226, 128 Apr 15 18:15 renderD128

While on 6.8.4-2-pve, the card0 changes to card1 on 226, 1, while the renderD128 looks the same.

I tried keeping the 226,0 and also tried changing it to 226,1 as it shows, but no matter what i do, it won't hardware transcode HDR content when i have the tonemap option ticked, where that function works fine under the 6.5 kernel.

Folks on the plex forum seem to believe it's an intel driver/firmware/software(?) issue and will require intel to make changes for the 6.8 kernel line, and that the card0 device is what handles the tonemapping piece.

Not sure if anyone has any experience or could confirm some of these findings.

I have an I7-1360P raptor lake cpu with iris xe graphics (i915 driver).

RolandK · Apr 17, 2024

> https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-651341

the reported performance differences of docker inside lxc here is one more reason for me to to NOT use docker inside LXC or recommend doing so.

eschmacher · Apr 17, 2024

nmorgowicz said:
Does anyone know how to fix HDR Tonemapping support on the 6.8 kernel to pass through to ubuntu LXC for plex?

On 6.5.13-5-pve my /dev/dri looks like this:

[root@nuc13 ~]$ ls -l /dev/dri
total 0
drwxr-xr-x 2 root root 80 Apr 15 18:15 by-path
crw-rw---- 1 root video 226, 0 Apr 15 18:15 card0
crw-rw---- 1 root render 226, 128 Apr 15 18:15 renderD128

While on 6.8.4-2-pve, the card0 changes to card1 on 226, 1, while the renderD128 looks the same.

I tried keeping the 226,0 and also tried changing it to 226,1 as it shows, but no matter what i do, it won't hardware transcode HDR content when i have the tonemap option ticked, where that function works fine under the 6.5 kernel.

Folks on the plex forum seem to believe it's an intel driver/firmware/software(?) issue and will require intel to make changes for the 6.8 kernel line, and that the card0 device is what handles the tonemapping piece.

Not sure if anyone has any experience or could confirm some of these findings.

I have an I7-1360P raptor lake cpu with iris xe graphics (i915 driver).

Did you see the post from the plex team addressing this? https://forums.plex.tv/t/ubuntu-24-04-hw-transcoding/873765
This is either a kernel issue or a plex compatibility issue with changes in the kernel. I would suggest staying on 6.5 until this is resolved.

fowr0yl · Apr 17, 2024

I have tried kernels from 6.1 to 6.8.4-2-pve
But same problem with all those kernels on my ASROCK J4105M.
The system shuts down without any visible errors, but did not switch off.
Had to switch off by hand. -> WOL is unusable.

nmorgowicz · Apr 17, 2024

eschmacher said:
Did you see the post from the plex team addressing this? https://forums.plex.tv/t/ubuntu-24-04-hw-transcoding/873765
This is either a kernel issue or a plex compatibility issue with changes in the kernel. I would suggest staying on 6.5 until this is resolved.

I didn't see that post, so i appreciate the link.

So that basically confirms it - i'll hold off on 6.8 until the intel issues are resolved.

BarryS83 · Apr 20, 2024

I have been running on on the new 6.8 kernel now for about 2 weeks. However, ever since I had opted in to the new kernel I have been experiencing stability issues on one of my VM's.

My TrueNAS-SCALE VM kept freezing. Initially it was once every few days. I first thought is was an anomaly. However, for the last couple of days it has frozen once a day, or even multiple times a day. The VM just becomes unresponsive. Can't login on the TrueNAS-SCALE site anymore, and can't access the shares anymore. Rebooting the VM using Proxmox fixes the issue initially. But few hours later it would freeze again.

It is not my ZFS-Pool that freezes. What I suspect is that the VM froze because of slow access times to the boot drive. It is on a NVME drive, same one that is running the Proxmox VE and other VM's and containers. Or it freezes because of a compatibility issue with the USB-drive bay that I am using for my ZFS-Pool. I know USB-passthrough is not recommended, and I should use an HBA-passthrough. However I am running this on an AMD NUC and there is no such option. As I only use the NAS VM for backup purposes I deemed it "save" enough, it is not my primary storage. It basically storage for data back-up for what is on my PC, laptop, phone. The drives in this bay do not get recognised as USB drives but as proper SATA drives by both Proxmox VE (if VM is off) or by the TrueNAS-SCALE (with the USB passthrough).

Yesterday I decided to lock the bootloader to the older 6.5 kernel. And so far the TrueNAS-SCALE VM remained stable. So for me this regression in stability appears to be caused by the new 6.8 kernel. Before on the 6.5 kernel it would be stable for weeks or months in a row, the VM would never freeze.

Other VM on the server is a Home Assistant instance, also using the NAS as backup storage. And a Ubuntu container, running Docker/Portainer, running Nextcloud AIO, NGINX Proxy Manager, etc.

The host is a AMD NUC with an AMD Ryzen 5 5560U and 64 GB of RAM.

I used the command below to lock the bootloader to the older kernel. Now I was wondering, is there a way to opt-out of the new 6.8 kernel?

root@pve:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve

Pinned kernel:
6.5.13-5-pve

leesteken · Apr 20, 2024

BarryS83 said:
Now I was wondering, is there a way to opt-out of the new 6.8 kernel?

Just uninstall all 6.8 kernels with apt purge proxmox-kernel-6.8* ?

BarryS83 · Apr 20, 2024

leesteken said:
Just uninstall all 6.8 kernels with apt purge proxmox-kernel-6.8* ?

Thank you very much. This command worked to uninstall the 6.8 kernel.

I also unpinned the 6.5 kernel now. As this is the automatically selected kernel now again.

t.lamprecht · Apr 20, 2024

BarryS83 said:
I have been running on on the new 6.8 kernel now for about 2 weeks. However, ever since I had opted in to the new kernel I have been experiencing stability issues on one of my VM's.

My TrueNAS-SCALE VM kept freezing. Initially it was once every few days. I first thought is was an anomaly. However, for the last couple of days it has frozen once a day, or even multiple times a day. The VM just becomes unresponsive. Can't login on the TrueNAS-SCALE site anymore, and can't access the shares anymore. Rebooting the VM using Proxmox fixes the issue initially. But few hours later it would freeze again.

It is not my ZFS-Pool that freezes. What I suspect is that the VM froze because of slow access times to the boot drive. It is on a NVME drive, same one that is running the Proxmox VE and other VM's and containers. Or it freezes because of a compatibility issue with the USB-drive bay that I am using for my ZFS-Pool. I know USB-passthrough is not recommended, and I should use an HBA-passthrough. However I am running this on an AMD NUC and there is no such option. As I only use the NAS VM for backup purposes I deemed it "save" enough, it is not my primary storage. It basically storage for data back-up for what is on my PC, laptop, phone. The drives in this bay do not get recognised as USB drives but as proper SATA drives by both Proxmox VE (if VM is off) or by the TrueNAS-SCALE (with the USB passthrough).

Yesterday I decided to lock the bootloader to the older 6.5 kernel. And so far the TrueNAS-SCALE VM remained stable. So for me this regression in stability appears to be caused by the new 6.8 kernel. Before on the 6.5 kernel it would be stable for weeks or months in a row, the VM would never freeze.

Other VM on the server is a Home Assistant instance, also using the NAS as backup storage. And a Ubuntu container, running Docker/Portainer, running Nextcloud AIO, NGINX Proxy Manager, etc.

The host is a AMD NUC with an AMD Ryzen 5 5560U and 64 GB of RAM.

I used the command below to lock the bootloader to the older kernel. Now I was wondering, is there a way to opt-out of the new 6.8 kernel?

root@pve:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve

Pinned kernel:
6.5.13-5-pve

Any specific errors in the host or guest kernel/system logs?

Please also note that the new 6.8 kernel will be the default relatively soon, so watch out on updates pulling it in again.

BarryS83 · Apr 21, 2024

t.lamprecht said:
Any specific errors in the host or guest kernel/system logs?

Please also note that the new 6.8 kernel will be the default relatively soon, so watch out on updates pulling it in again.

I'll try to look for errors in the log. So far I have not yet found anything really.

On the guest (TrueNAS-SCALE VM) I found this not sure if this is the cause:

Apr 19 18:27:39 truenas kernel: Unstable clock detected, switching default tracing clock to "global"
If you want to keep using the local clock, then add:
"trace_clock=local"
on the kernel command line

On the Proxmox host I have not yet found any errors. At least not when searching on the word error in the Systemlog in the Proxmox UI.

Anything particular I need to look for?

Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

Renowned Member

Renowned Member

Member

Well-Known Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Member

New Member

Proxmox Staff Member

New Member

Renowned Member

Member

Member

New Member

New Member

Distinguished Member

New Member

Proxmox Staff Member

New Member

We value your privacy