Opt-in Linux 6.1 Kernel for Proxmox VE 7.x available

jasonsansone · Feb 6, 2023

adamb said:
Any one else notice some major performance changes going from 5.15.x -> 6.1.x?

Upgraded one of my heavy hitter front ends (Quad Socket DL 560 Gen10) and now we are seeing a major load increases. CPU load has almost doubled.

Going back to 5.15.x has corrected the issue, but with live migration broke in 5.15 it makes things tough.

View attachment 46481

Does your Gen10 have Scalable Gen 1 or Gen 2 chips? Retbleed was patched in 5.19.

adamb · Feb 6, 2023

jasonsansone said:
Does your Gen10 have Scalable Gen 1 or Gen 2 chips? Retbleed was patched in 5.19.

They are gen2's.

Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz

adamb · Feb 6, 2023

jasonsansone said:
Does your Gen10 have Scalable Gen 1 or Gen 2 chips? Retbleed was patched in 5.19.

Per

t.lamprecht

Those are in 5.15 too since end of July, i.e. pve-kernel-5.15.39-2.

jasonsansone · Feb 6, 2023

Gen 2 isn't effected anyways, so I am of no help. Ignore me and carry on...

t.lamprecht · Feb 7, 2023

FYI: A package basing on 6.1.10 is now available on pvetest as pve-kernel-6.1.10-1-pve.

MnM · Feb 8, 2023

Is this fix supported by this 6.X.X pve kernel?

https://www.phoronix.com/news/Linux-AMD-Old-Chipset-WA

https://www.phoronix.com/news/Linux-6.0-AMD-Chipset-WA

AdriftAtlas · Feb 8, 2023

I'm being asked to update Kernel 5.15 even though I am running:
Linux 6.1.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.1.6-1 (2023-01-28T00:00Z)

Will it update and still boot with 6.1.6 or will I have to manually reselect it as default?

t.lamprecht · Feb 8, 2023

AdriftAtlas said:
Will it update and still boot with 6.1.6 or will I have to manually reselect it as default?

Default booted kernel is the newest by version number, so if you did no override (e.g., via proxmox-boot-tool kernel pin) then it will use newest 6.1

t.lamprecht · Feb 8, 2023

MnM said:
Is this fix supported by this 6.X.X pve kernel?

https://www.phoronix.com/news/Linux-AMD-Old-Chipset-WA

https://www.phoronix.com/news/Linux-6.0-AMD-Chipset-WA

From your linked article: "Update: The fix has been merged for Linux 6.0!" so yeah, it is also included in 6.1.

janssensm · Feb 8, 2023

t.lamprecht said:
FYI: A package basing on 6.1.10 is now available on pvetest as pve-kernel-6.1.10-1-pve.

Just tested 6.1.10-1-pve and can confirm that amdgpu unregister issue and amdgpu mst issue are both fixed.
I just tested for these two, but didn´t see other errors pop up, but also didn't test any further than that.

xi784 · Feb 8, 2023

t.lamprecht said:
FYI: A package basing on 6.1.10 is now available on pvetest as pve-kernel-6.1.10-1-pve.

not sure if this should solve the block-device passthrough, but with 6.1.10-1-pve everything works again.

t.lamprecht · Feb 8, 2023

xi784 said:
not sure if this should solve the block-device passthrough, but with 6.1.10-1-pve everything works again.

Great to hear & thanks for the feedback!

TBH, I postponed checking that out until after uploading the 6.1.10 release in the hope that it would already fix it (there were quite some commits addressing stuff that might just be related), as 6.1.10 was planned to be release quickly anyway and it would then possibly save some debugging time.

donhwyo · Feb 9, 2023

Hi. I have been having kernel panics with all the 5.19 and newer kernels. Runs great for about a day. Fine with the 5.15's. They run until the next new kernel requires a reboot. Or I try the optional kernel. LOL I tried the latest one in testing with and without the other testing packages. Still lasted almost a day. The hardware is an older Dell r715 and Proxmox is updated as of now.
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u2.1
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

It has been a long time since I have had kernel panics and can't remember what to look at. If you would like to see any logs just ask.

Thanks

t.lamprecht · Feb 9, 2023

donhwyo said:
I have been having kernel panics with all the 5.19 and newer kernels. Runs great for about a day. Fine with the 5.15's. They run until the next new kernel requires a reboot. Or I try the optional kernel. LOL I tried the latest one in testing with and without the other testing packages. Still lasted almost a day. The hardware is an older Dell r715

13 year old CPU, and yeah HW that is ~10+ year old tends to get more rare and much less tested by kernel devs and QA farms.

Anyhow, can you please open a new thread and post the full kernel OOPs/panic log there? Check journalctl for older boots, if the server frezes completely and isn't syncing logs to disk, you might get away by extracting that message via a ssh, i.e., connect from another server/pc/raspbi and run journalctl -f (sometimes network survices long enough to get something out).

donhwyo · Feb 9, 2023

Thanks will do at the next panic.

https://forum.proxmox.com/threads/opt-in-kernel-panics.122589/

jasonsansone · Feb 10, 2023

I have a problem that just started with the updates pushed to the no-subscription repo overnight. QEMU won't start.

Code:

task started by HA resource agent
terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of_buffer'
  what():  End of buffer
TASK ERROR: start failed: QEMU exited with code 1

Here is a subsequent error:

Code:

TASK ERROR: start failed: command '/usr/bin/kvm -id 110 -name 'macOS-caching,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -smbios 'type=1,uuid=28a447b1-bc85-4e0b-8319-2415292e33e6' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,cache=writeback,format=raw,file=rbd:CephRBD/vm-110-disk-0:conf=/etc/pve/ceph.conf:rbd_cache_policy=writeback,size=540672' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/110.vnc,password=on' -cpu 'Skylake-Server-IBRS,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,vendor=GenuineIntel' -m 4096 -object 'iothread,id=iothread-virtio0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=14235fd0-aef4-4e83-bc55-8f7dfadf2d59' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vmware-svga,id=vga,vgamem_mb=32,bus=pcie.0,addr=0x1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2a9bd575566d' -drive 'file=rbd:CephRBD/vm-110-disk-1:conf=/etc/pve/ceph.conf,if=none,id=drive-virtio0,cache=none,aio=io_uring,discard=on,format=raw,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=62:65:47:7B:BF:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024' -machine 'type=pc-q35-7.1+pve0' -device 'isa-applesmc,osk=ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc' -smbios 'type=2' -smp '4,sockets=1,cores=2,threads=2,maxcpus=4' -cpu 'Skylake-Server-IBRS,vendor=GenuineIntel,+avx2,+avx512f,+avx512dq,+avx512cd,+avx512bw,+avx512vl,+vmx,+pclmulqdq,+pdcm,+bmi1,+hle,+smep,+bmi2,+erms,+xsaveopt,+xsavec,+xsaves,+xgetbv1,+smap,+rtm,+mpx,+rdseed,+adx,+clflushopt,+clwb,+pku,+stibp,+aes'' failed: got timeout

proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-6.1.6-1-pve: 6.1.6-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1 pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Edit: The problem somehow lies with virtio-blk. The issue doesn't happen with virtio-scsi.

Edit 2: Rolled back to 6.1.6. The problem persists. Issue isn't caused exclusively by the kernel update.

Neobin · Feb 11, 2023

First boot with: 6.1.10-1-pve lead to a little hiccup while trying to autostart my TrueNAS-VM with PCIe-passthrough and ended with: Starting VM 101 failed: unable to read tail (got 0 bytes).
Had something like that never before and also on the following three or four reboots of the host (still with: 6.1.10-1-pve) I did for testing, it did not happen again and all is up and running, afaict.
Suspicious...

Syslog-snippet in attached: hiccup.txt, because of the post-limit.

Bash:

bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00.0,pcie=1
hostpci1: 0000:01:00.0,pcie=1
hostpci2: 0000:0a:00.0,pcie=1
hugepages: 1024
ide2: none,media=cdrom
machine: q35
memory: 32768
meta: creation-qemu=7.0.0,ctime=1665967836
name: TrueNAS
net0: virtio=[...],bridge=vmbr0
numa: 1
onboot: 1
ostype: other
scsi0: local-zfs:vm-101-disk-1,discard=on,iothread=1,size=16G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=[...]
sockets: 1
startup: order=2,up=90
vmgenid: [...]

Bash:

proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-6.1.6-1-pve: 6.1.6-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 10.1-3~bpo11+1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

PS.: These

Bash:

Feb 11 07:28:58 pve kernel: ata1.00: disable device
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Stopping disk
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

are ever there. It is a drive, whose HBA also gets PCIe-passed through to the TrueNAS-VM.

leesteken · Feb 11, 2023

Neobin said:
Feb 11 07:28:58 pve kernel: ata1.00: disable device Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Stopping disk Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
are ever there. It is a drive, whose HBA also gets PCIe-passed through to the TrueNAS-VM.

This and the stack trace make me think that the host driver does not respond well to the unbinding of the device and/or drives. Early binding the device to vfio-pci (which will also bind all identical devices) and making sure vfio-pci loads before the actual driver using a softdep might prevent this. Then the host kernel/driver won't panic about devices/drives disappearing because they are never connected to the host and only to the VM.

Neobin · Feb 11, 2023

leesteken said:
This and the stack trace make me think that the host driver does not respond well to the unbinding of the device and/or drives. Early binding the device to vfio-pci (which will also bind all identical devices) and making sure vfio-pci loads before the actual driver using a softdep might prevent this. Then the host kernel/driver won't panic about devices/drives disappearing because they are never connected to the host and only to the VM.

Thank you for the hint.

The HBA is actually bound to: vfio-pci (via: /etc/modprobe.d/vfio.conf), so I seemingly would additionally need the softdep?!
The question is, since this is a cheap HBA (x1, only for a scratch disk) it uses, afair, the ahci-driver (and -module), would setting the softdep for ahci give me problems with the onboard SATA-controllers that also use the ahci-driver and are used on the host itself?

leesteken · Feb 11, 2023

Neobin said:
The HBA is actually bound to: vfio-pci (via: /etc/modprobe.d/vfio.conf), so I seemingly would additionally need the softdep?!
The question is, since this is a cheap HBA (x1, only for a scratch disk) it uses, afair, the ahci-driver (and -module), would setting the softdep for ahci give me problems with the onboard SATA-controllers that also use the ahci-driver and are used on the host itself?

Blacklisting ahci would get you into trouble (since all SATA cotnrolers need it) but softdep should do the trick. Something like softdep ahci pre: vfio-pci (added to /etc/modprobe.d/vfio.conf and running update-initramfs -u) will simply load vfio-pci (just) before ahci loads. Check with lspci -nnk after a fresh reboot without starting the VM to make sure the driver in use is vfio-pci.

Opt-in Linux 6.1 Kernel for Proxmox VE 7.x available

Well-Known Member

Famous Member

Famous Member

t.lamprecht​

Well-Known Member

Proxmox Staff Member

Renowned Member

Member

Proxmox Staff Member

Proxmox Staff Member

Famous Member

Active Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Well-Known Member

Distinguished Member

Attachments

Distinguished Member

Distinguished Member

Distinguished Member

We value your privacy

t.lamprecht