Opt-in Linux 6.11 Kernel for Proxmox VE 8 available on test & no-subscription

Reporting a new issue with 6.11.11-1 - Proxmox is not reliable on this version as a glusterfs client. Intermittently, records are not written and files are not transferred via rsync between the client and the server. No errors logged on either the client or the server, so this is a "silent failure". Reverting the client Proxmox server back to 6.8.12-7 eliminates the problem.
i think is better to wait for 6.12... is a main 6.11 kernel issues (check other distro problems with idrac and kernel 6.11), i think the proble come because (on my r420 server) the video memory is shared with idrac.
 
  • Like
Reactions: at4b
Upgraded today only issue was on Dell T5500 dual xeon (yeah kinda old HW) I had to do a powerdown after the restart. The load after reboot was over a 100 and VMs wouldn't boot. Since I did the power down and back up it's running amazingly well. The T7810 had zero issues. So I put that on the older HW/BiOS. Amazing work folks! thank you
 
Upgraded today only issue was on Dell T5500 dual xeon (yeah kinda old HW) I had to do a powerdown after the restart. The load after reboot was over a 100 and VMs wouldn't boot. Since I did the power down and back up it's running amazingly well. The T7810 had zero issues. So I put that on the older HW/BiOS. Amazing work folks! thank you
Does the T5500 and T7810 have iDRAC available? If so, which iDRAC version?

The final nail for us was unreliable GlusterFS. We rely on Gluster for data, VM, and container storage.
 
See this post, of somebody who is running the 6.11 kernel on 2 different hosts for 3 months with apparently no issues.
 
Reading through the comments, a clear pattern emerges. Small home lab setups don't have issues. Large implementations, particularly those with advanced storage and management solutions, have numerous issues. Deciding which group you are in will likely make the decision about whether to try 6.11 easy for you.
 
Reading through the comments, a clear pattern emerges. Small home lab setups don't have issues. Large implementations, particularly those with advanced storage and management solutions, have numerous issues. Deciding which group you are in will likely make the decision about whether to try 6.11 easy for you.
yeah, home grade hardware is a bit basic comparing it to enterprise class.
 
enterprise class
Correct - but anyway I think most production/commercial instances would not really be "testing" kernels. This is one of the conundrums of newer kernel testing - it usually is not (in the most) being done on the real equipment it will eventually be run on. Hence the "surprises" that come later.
 
I'm running the 6.11 kernel in my PVE testing environment (three nodes, hyperconverged) as well as non-production (currently eleven nodes) using Dell R730 and R740 hardware, along with CEPH and SAN backed storage (iSCSI). The only issue I've seen with 6.11 is my console via iDRAC is unusable, and I haven't found the right way to make that work via grub options so far. Not a huge concern, and I also have not dedicated a lot of time to nail this one down. Something in the kernel is setting a weird mode on the console that the iDRAC cannot handle, or disconnecting the iDRAC virtual console altogether. Plugging in a monitor seems to work OK, though, so it doesn't make a lot of sense.
 
I'm running the 6.11 kernel in my PVE testing environment (three nodes, hyperconverged) as well as non-production (currently eleven nodes) using Dell R730 and R740 hardware, along with CEPH and SAN backed storage (iSCSI). The only issue I've seen with 6.11 is my console via iDRAC is unusable, and I haven't found the right way to make that work via grub options so far. Not a huge concern, and I also have not dedicated a lot of time to nail this one down. Something in the kernel is setting a weird mode on the console that the iDRAC cannot handle, or disconnecting the iDRAC virtual console altogether. Plugging in a monitor seems to work OK, though, so it doesn't make a lot of sense.
add in /etc/default/grub nomodeset video=1024x768 should resolve the issue.
 
I upgraded to proxmox-kernel-6.11.11-1 yesterday. Everything seems like it is working. I have usb devices connected to vms, pcie passthrough to my firewall vm, ZFS with special devices, SLOG and L2ARC. RAM usage is like before. CPU usage may have dropped marginally. Overall responsiveness may have improved also. I am running AMD Ryzen 9 7950X w/128GB ECC RAM on a ASUS mobo.
 
FWIW, I've moved back down to 6.8.12-8-pve. The 6.11.11-1-pve for some reason had my CPU frequencies idling much higher and more erratic on my N100, hovering mostly around 1200Mhz and bouncing around.

With the 6.8 kernel, all cores idle at the minimum frequency I've set, 700Mhz. I'm using pstate and powersave governor.

I'm setting these on reboot via crontab:
Code:
# crontab -l
@reboot (sleep 60 && echo powersave | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor)
@reboot (sleep 65 && echo balance_power | tee /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference)
@reboot /usr/bin/iperf3 -sD #just for funsies
 
This looks like a conservative, not a powersave :)
Handy packages to use for CPU mode management are these: cpufrequtils and tuned-utils.
Sounds plausible.. though the N100 pstate doesn't have that
Code:
# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_governors  
performance powersave
performance powersave
performance powersave
performance powersave

Yep I run cpufreq-info regularly, as well as powercap-utils
 
Hello,
After some digging it turned out that the N100 only has powersave and performance mode with intel_pstate. But I'm not surprised, some time ago I saw similar strange behavior of intel_pstate on another OS, and I had to disable intel_pstate and switch to acpi-cpufreq.
I'm just sharing my experience :)
 
Also with proxmox-kernel-6.11.11-1, it would load the Bluetooth kernel module after a few minutes after booting for no apparent reason. From journalctl:
Code:
[   20.446785] x86/split lock detection: #AC: CPU 0/KVM/1134 took a split_lock trap at address: 0x2079
[ 3749.169604] Bluetooth: Core ver 2.22
[ 3749.169691] NET: Registered PF_BLUETOOTH protocol family
[ 3749.169697] Bluetooth: HCI device and connection manager initialized
[ 3749.169710] Bluetooth: HCI socket layer initialized
[ 3749.169718] Bluetooth: L2CAP socket layer initialized
[ 3749.169735] Bluetooth: SCO socket layer initialized
[ 3851.710977] NET: Unregistered PF_BLUETOOTH protocol family

FWIW, I have Bluetooth disabled in BIOS. After booting back into 6.8.12-8-pve, I haven't seen this module load itself anymore.

Code:
# uname -r ; dmesg | grep -ci blue ; uptime
6.8.12-8-pve
0
 18:23:11 up  6:07,  2 users,  load average: 0.58, 0.60, 0.55