Kernel 5.11

For those with dkms troubles in this kernel: a newer 5.11 build is available, at least on the pve-no-subscription repository, its pve-headers-5.11.17-1-pve package should improve the modules.lds linker script situation required for external module builds.
 
  • Like
Reactions: Neobin and Dark26
For those with dkms troubles in this kernel: a newer 5.11 build is available, at least on the pve-no-subscription repository, its pve-headers-5.11.17-1-pve package should improve the modules.lds linker script situation required for external module builds.

Works flawless for me with the most recent nvidia-driver from buster-backports.
Thank you very much! :)
 
  • Like
Reactions: t.lamprecht
For those with dkms troubles in this kernel: a newer 5.11 build is available, at least on the pve-no-subscription repository, its pve-headers-5.11.17-1-pve package should improve the modules.lds linker script situation required for external module builds.
What about the integration of the "aufs" driver?
 
What about the integration of the "aufs" driver?
Ubuntu sunset their natively integrated Aufs support for their 21.04 Release, which the 5.11 Kernel comes from, only enable it for backports. So, enabling it for the 5.11 backport to the Proxmox VE 6.4 release could be done, but would be a very temporary measurement. We do not currently plan to re-add support for aufs in future releases, so alternatives for you would be:
  • building aufs yourself
  • using another FS as base for the applications requiring overlays, either through a loop device or by using (pasing through) a zvol into the CT and format it with ext4, xfs, ...
  • using a VM to manage those applications
Ideally, the kernel built-in overlayfs and ZFS would support each other avoiding more external modules completely, but that does not seems to be very close to happen.
 
Hello,

Last Edit ... nothing to do with kernel 5.11?
Problem was there before (SATA SSD connected as UDMA/33 (slow speed)) but kernel 5.4 never told me anything.
Kernel 5.11 was giving me errors that gave me enough info to find an alternative BIOS (modded) for my NL40 to accept my SSD at correct speed.

To keep this thread clean i'll "spoiler" all the rest of my text because it's not relevant here.

Error was :
Code:
May 18 10:04:53 pve-backup kernel: ata1.01: ATA-9: SanDisk Ultra II 240GB, X41100RL, max UDMA/133
May 18 10:04:53 pve-backup kernel: ata1.01: 468862128 sectors, multi 1: LBA48 NCQ (depth 0/32)
May 18 10:04:53 pve-backup kernel: ata1.01: limited to UDMA/33 due to 40-wire cable
May 18 10:19:44 pve-backup kernel: ata1: lost interrupt (Status 0x58)
May 18 10:19:44 pve-backup kernel: ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 18 10:19:44 pve-backup kernel: ata1.01: failed command: READ DMA
May 18 10:19:44 pve-backup kernel: ata1.01: cmd c8/00:01:7f:c0:55/00:00:00:00:00/f8 tag 0 dma 512 in
         res 40/00:00:01:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout)
May 18 10:19:44 pve-backup kernel: ata1.01: status: { DRDY }
May 18 10:19:44 pve-backup kernel: ata1: soft resetting link
May 18 10:19:44 pve-backup kernel: ata1.01: configured for UDMA/33
May 18 10:19:44 pve-backup kernel: ata1: EH complete

Just updated one of my proxmox (on an old hardware) to the 5.11 kernel and I have a strange messages in logs regarding apparently my disks.

My server spec :
HP Proliant Microserver NL40
CPU(s) AMD Turion(tm) II Neo (1 Core/2 Threads)
8Go RAM
1* SSD SATA
4* HDD SATA
1621325487928.png

Update went well. The server is booting without any errors.
On this server I run 2 VM. One pfSense VM and ont PBS VM.
If i keep the server with only pfSense VM up, no error messages, everything seems fine.

The errors comes when I start the PBS VM. After like 1 or 2 minutes I have strange log messages.
So first, here is my PBS VM config :
1621270188981.png
While pasedthrough the VM, those 4 disks are excluded from pvestatd (like that I can spin them down).
When i run the VM the disks are spinned up and running.

And now the errors messages.
I have 2 kind of error messages.
Those errors messages may appear 1 to 4 times after the boot of the VM.

Some are directly on the screen of the pve :
1621326644493.png

Some are in the logs that I can see in pve WebGUI :
May 18 10:04:53 pve-backup kernel: ata1.01: ATA-9: SanDisk Ultra II 240GB, X41100RL, max UDMA/133
May 18 10:04:53 pve-backup kernel: ata1.01: 468862128 sectors, multi 1: LBA48 NCQ (depth 0/32)
May 18 10:04:53 pve-backup kernel: ata1.01: limited to UDMA/33 due to 40-wire cable
May 18 10:19:44 pve-backup kernel: ata1: lost interrupt (Status 0x58)
May 18 10:19:44 pve-backup kernel: ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 18 10:19:44 pve-backup kernel: ata1.01: failed command: READ DMA
May 18 10:19:44 pve-backup kernel: ata1.01: cmd c8/00:01:7f:c0:55/00:00:00:00:00/f8 tag 0 dma 512 in
res 40/00:00:01:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout)
May 18 10:19:44 pve-backup kernel: ata1.01: status: { DRDY }
May 18 10:19:44 pve-backup kernel: ata1: soft resetting link
May 18 10:19:44 pve-backup kernel: ata1.01: configured for UDMA/33
May 18 10:19:44 pve-backup kernel: ata1: EH complete

What intrigates me is this limited to UDMA/33 due to 40-wire cable and the fact that the error seems to be on my SSD and not on the HDD that i passthrough on my PBS VM.



Things I tested :
- I booted back on the 5.4 kernel and everything went fine.
- I stopped the PBS VM to run automatically at boot and staarted it manually 20 minutes after the server boot (with 5.11 kernel). Same behavior at start.
- When the PBS VM is offline : no error messages. (even after like 1h).
- /etc/multipath.conf not present on the system (to answer a question).

Other info :
- I don't think it's hardware / cable related because those disks are used to backup my system ... and backups take like 3h to 4h to do so i think that if I had intermitent sata cable problem backups/restoration whould have showed problems.
- One really strange thing is ... when i start my PBS VM, graphs on the pve WebGUI are like freezed, I can not access the console of the PBS VM via WebGUI either
- Error messages are always on the ata1.01 wich is my SSD where proxmox is installed ...

@t.lamprecht to keep it "clean" in this thread I updated this post. If you prefer I can delete the post and create a new thread. Just tell me.

Edit :
So it seems that the problem is on my SSD and not my storage disks. Logs from pve WebGUI seems to show it.
That's really strange because ... well ... why does it only happen when i run my PBS VM ... the pfSense VM is olso on the SSD ... i'm lost.

Edit 2 :
Ok so apparently it's linked to my MB / Bios. In fact my SSD is seen as udma2 and not udma6 on kernel 5.4 and 5.11.
The only difference is that kernel 5.11 is showing log messages apparently.
I'll continue my search to confirm that.

Kernel 5.11
sudo hdparm -I /dev/sda | grep -i udma
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6

Kernel 5.4
sudo hdparm -I /dev/sda | grep -i udma
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6

Thanks @t.lamprecht for your help :)
 
Last edited:
on an old hardware
Which hardware exactly (CPU, mainboard/server vendor and model)?

I have a strange messages in logs regarding apparently my disks.

In general new messages do not immediately have to mean that the new kernel is at fault, as the issue, if even one, could have been there forever and the newer kernel just exposes (logs) it...

First thing i want to know is how to revert to 5.4 kernel as default kernel and second is
For permanently switching back: Remove the installed pve-kernel-5.11-.... kernel package with apt remove PKG, that should remove the meta package too and thus only the in PVE 6.4 still default 5.4 kernel should be available, then reboot.

For a single boot to confirm if this is some effect of the new kernel: choose the old one on boot (e.g. "Advanced" menu in GRUB boot loader).

do you have any idea on what are those messages (disks seems to be working even with those messages)
Is there iscsi/multipath setup on the system? If, then maybe the device paths changed due to the newer kernel, and the multipathd conf does not exclude those local devices anymore, so it tries to access it, which fails for local devices. In that case check if any device path changed and add the new one also to the blacklist in /etc/multipath.conf.

Another potential source of such link-resets could mean that there's a (slightly/semi-) broken SATA cable causing this, e.g., loose connection, would be a bit of a weird coincidence though.

If none of above seem to apply, and you still want to investigate this, I'd suggest opening a new thread with more info on your setup.
 
Hello,
So after my first server migration from 5.4 to 5.11, I decided to migrate the second.
Everything worked fine until I looked at the CPU usage.

On the host (and all the VM/LXC) I see a big CPU increase for exactly the same VM/LXC started (and mostly idle).
1621377951345.png

I even compared by another way (telegraf inside one of the VM) and I see the same increase :
1621378123707.png
VM config for example :
1621378444778.png
1621378533978.png

Actual config :
1621378307737.png

installed packages versions :
Code:
proxmox-ve: 6.4-1 (running kernel: 5.11.17-1-pve)
pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c)
pve-kernel-5.11: 7.0-1~bpo10
pve-kernel-5.4: 6.4-2
pve-kernel-helper: 6.4-2
pve-kernel-5.11.17-1-pve: 5.11.17-1~bpo10
pve-kernel-5.4.114-1-pve: 5.4.114-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-2
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.6-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-4
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-3
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

Is this something expected ?
 

Attachments

  • 1621378489925.png
    1621378489925.png
    16.8 KB · Views: 11
Why 5.11 and not 5.12 or 5.10 LTS ?
That strange to use non LTS Kernel.
I was just wondering that too, since it was mentioned 5.11 would be used in PVE 7.0 "later this year".

5.10 is an LTS, which would seem to make it a sensible choice. I assumed that's why 5.4 was the previous default, being the last LTS.

5.12 is already out, and by later in the year we'll have 5.13 or maybe even 5.14. What made 5.11 the choice for 7.0?
 
Why 5.11 and not 5.12 or 5.10 LTS ?
That strange to use non LTS Kernel.
I think because proxmox is basically reusing the ubuntu 21.04 kernel which is 5.11 and not rolling this out completely on their own, but would be interested in their answer.
 
For my part, I would have preferred a 5.10 LTS kernel rather than the ubuntu one. Who precisely does everything against the grain they are never in LTS that's why I left ubuntu a long time ago.
 
For my part, I would have preferred a 5.10 LTS kernel rather than the ubuntu one
Why?

One can port stable kernel patches to other kernel release tree, that's a need in general anyway, not everything can be immediately upstreamed and made available in a LTS.

FWIW, Proxmox VE based its Kernel on the Ubuntu Kernel since Proxmox VE 4.0, and that worked out well in general, at least I do not remember a time when a real issue popped up in our/ubuntu's kernel which would have been avoided in the respective LTS tree.
 
I just moved to 5.11 and noticed that the CPU counter for "guest" has stopped working. I have a telegraf + grafana setup which is where I first saw this but looking at /proc/stat shows all the values for guest usage at 0. Does anyone else see the same behavior?
 
Ignore my comment about KSM. I didn't wait enough time for it to kick in and scan through the pages. Seems like it needed an hour or so.

I still can't find the reason for guest CPU usage not being reported though...
 
For those with dkms troubles in this kernel: a newer 5.11 build is available, at least on the pve-no-subscription repository, its pve-headers-5.11.17-1-pve package should improve the modules.lds linker script situation required for external module builds.
vendor-reset builds without problems with pve-enterprise (after uninstalling wireguard-dkms, which becomes unnecessary in 5.11). Thanks!
 
  • Like
Reactions: t.lamprecht

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!