Opt-in Linux 6.1 Kernel for Proxmox VE 7.x available

Any one else notice some major performance changes going from 5.15.x -> 6.1.x?

Upgraded one of my heavy hitter front ends (Quad Socket DL 560 Gen10) and now we are seeing a major load increases. CPU load has almost doubled.

Going back to 5.15.x has corrected the issue, but with live migration broke in 5.15 it makes things tough.

View attachment 46481

Does your Gen10 have Scalable Gen 1 or Gen 2 chips? Retbleed was patched in 5.19.
 
I'm being asked to update Kernel 5.15 even though I am running:
Linux 6.1.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.1.6-1 (2023-01-28T00:00Z)

Will it update and still boot with 6.1.6 or will I have to manually reselect it as default?

Screenshot 2023-02-07 at 8.32.52 PM.png
 
Will it update and still boot with 6.1.6 or will I have to manually reselect it as default?
Default booted kernel is the newest by version number, so if you did no override (e.g., via proxmox-boot-tool kernel pin) then it will use newest 6.1
 
not sure if this should solve the block-device passthrough, but with 6.1.10-1-pve everything works again. ;)
Great to hear & thanks for the feedback!

TBH, I postponed checking that out until after uploading the 6.1.10 release in the hope that it would already fix it (there were quite some commits addressing stuff that might just be related), as 6.1.10 was planned to be release quickly anyway and it would then possibly save some debugging time.
 
  • Like
Reactions: xi784
Hi. I have been having kernel panics with all the 5.19 and newer kernels. Runs great for about a day. Fine with the 5.15's. They run until the next new kernel requires a reboot. Or I try the optional kernel. LOL I tried the latest one in testing with and without the other testing packages. Still lasted almost a day. The hardware is an older Dell r715 and Proxmox is updated as of now.
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u2.1
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

It has been a long time since I have had kernel panics and can't remember what to look at. If you would like to see any logs just ask.

Thanks
 
I have been having kernel panics with all the 5.19 and newer kernels. Runs great for about a day. Fine with the 5.15's. They run until the next new kernel requires a reboot. Or I try the optional kernel. LOL I tried the latest one in testing with and without the other testing packages. Still lasted almost a day. The hardware is an older Dell r715
13 year old CPU, and yeah HW that is ~10+ year old tends to get more rare and much less tested by kernel devs and QA farms.

Anyhow, can you please open a new thread and post the full kernel OOPs/panic log there? Check journalctl for older boots, if the server frezes completely and isn't syncing logs to disk, you might get away by extracting that message via a ssh, i.e., connect from another server/pc/raspbi and run journalctl -f (sometimes network survices long enough to get something out).
 
I have a problem that just started with the updates pushed to the no-subscription repo overnight. QEMU won't start.

Code:
task started by HA resource agent
terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of_buffer'
  what():  End of buffer
TASK ERROR: start failed: QEMU exited with code 1

Here is a subsequent error:
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 110 -name 'macOS-caching,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -smbios 'type=1,uuid=28a447b1-bc85-4e0b-8319-2415292e33e6' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,cache=writeback,format=raw,file=rbd:CephRBD/vm-110-disk-0:conf=/etc/pve/ceph.conf:rbd_cache_policy=writeback,size=540672' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/110.vnc,password=on' -cpu 'Skylake-Server-IBRS,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,vendor=GenuineIntel' -m 4096 -object 'iothread,id=iothread-virtio0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=14235fd0-aef4-4e83-bc55-8f7dfadf2d59' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vmware-svga,id=vga,vgamem_mb=32,bus=pcie.0,addr=0x1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2a9bd575566d' -drive 'file=rbd:CephRBD/vm-110-disk-1:conf=/etc/pve/ceph.conf,if=none,id=drive-virtio0,cache=none,aio=io_uring,discard=on,format=raw,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=62:65:47:7B:BF:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024' -machine 'type=pc-q35-7.1+pve0' -device 'isa-applesmc,osk=ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc' -smbios 'type=2' -smp '4,sockets=1,cores=2,threads=2,maxcpus=4' -cpu 'Skylake-Server-IBRS,vendor=GenuineIntel,+avx2,+avx512f,+avx512dq,+avx512cd,+avx512bw,+avx512vl,+vmx,+pclmulqdq,+pdcm,+bmi1,+hle,+smep,+bmi2,+erms,+xsaveopt,+xsavec,+xsaves,+xgetbv1,+smap,+rtm,+mpx,+rdseed,+adx,+clflushopt,+clwb,+pku,+stibp,+aes'' failed: got timeout

proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-6.1.6-1-pve: 6.1.6-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1 pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Edit: The problem somehow lies with virtio-blk. The issue doesn't happen with virtio-scsi.

Edit 2: Rolled back to 6.1.6. The problem persists. Issue isn't caused exclusively by the kernel update.
 
Last edited:
First boot with: 6.1.10-1-pve lead to a little hiccup while trying to autostart my TrueNAS-VM with PCIe-passthrough and ended with: Starting VM 101 failed: unable to read tail (got 0 bytes).
Had something like that never before and also on the following three or four reboots of the host (still with: 6.1.10-1-pve) I did for testing, it did not happen again and all is up and running, afaict.
Suspicious... o_O

Syslog-snippet in attached: hiccup.txt, because of the post-limit.
Bash:
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00.0,pcie=1
hostpci1: 0000:01:00.0,pcie=1
hostpci2: 0000:0a:00.0,pcie=1
hugepages: 1024
ide2: none,media=cdrom
machine: q35
memory: 32768
meta: creation-qemu=7.0.0,ctime=1665967836
name: TrueNAS
net0: virtio=[...],bridge=vmbr0
numa: 1
onboot: 1
ostype: other
scsi0: local-zfs:vm-101-disk-1,discard=on,iothread=1,size=16G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=[...]
sockets: 1
startup: order=2,up=90
vmgenid: [...]
Bash:
proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-6.1.6-1-pve: 6.1.6-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 10.1-3~bpo11+1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

PS.: These
Bash:
Feb 11 07:28:58 pve kernel: ata1.00: disable device
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Stopping disk
Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
are ever there. It is a drive, whose HBA also gets PCIe-passed through to the TrueNAS-VM.
 

Attachments

  • hiccup.txt
    17.8 KB · Views: 7
Feb 11 07:28:58 pve kernel: ata1.00: disable device Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Stopping disk Feb 11 07:28:58 pve kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
are ever there. It is a drive, whose HBA also gets PCIe-passed through to the TrueNAS-VM.
This and the stack trace make me think that the host driver does not respond well to the unbinding of the device and/or drives. Early binding the device to vfio-pci (which will also bind all identical devices) and making sure vfio-pci loads before the actual driver using a softdep might prevent this. Then the host kernel/driver won't panic about devices/drives disappearing because they are never connected to the host and only to the VM.
 
  • Like
Reactions: Neobin
This and the stack trace make me think that the host driver does not respond well to the unbinding of the device and/or drives. Early binding the device to vfio-pci (which will also bind all identical devices) and making sure vfio-pci loads before the actual driver using a softdep might prevent this. Then the host kernel/driver won't panic about devices/drives disappearing because they are never connected to the host and only to the VM.

Thank you for the hint. :)
The HBA is actually bound to: vfio-pci (via: /etc/modprobe.d/vfio.conf), so I seemingly would additionally need the softdep?!
The question is, since this is a cheap HBA (x1, only for a scratch disk) it uses, afair, the ahci-driver (and -module), would setting the softdep for ahci give me problems with the onboard SATA-controllers that also use the ahci-driver and are used on the host itself?
 
The HBA is actually bound to: vfio-pci (via: /etc/modprobe.d/vfio.conf), so I seemingly would additionally need the softdep?!
The question is, since this is a cheap HBA (x1, only for a scratch disk) it uses, afair, the ahci-driver (and -module), would setting the softdep for ahci give me problems with the onboard SATA-controllers that also use the ahci-driver and are used on the host itself?
Blacklisting ahci would get you into trouble (since all SATA cotnrolers need it) but softdep should do the trick. Something like softdep ahci pre: vfio-pci (added to /etc/modprobe.d/vfio.conf and running update-initramfs -u) will simply load vfio-pci (just) before ahci loads. Check with lspci -nnk after a fresh reboot without starting the VM to make sure the driver in use is vfio-pci.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!