Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

yjjoe

New Member
Dec 27, 2021
9
1
3
22
Working good on my system with AMD 3900X (ondemand governor) on X570 system with IOMMU and SR-IOV.
Using last i40e and iavf drivers
Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.30-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 nmi_watchdog=0 mitigations=off

Have a good day.
Hi Androme,

If you have free time to explain,
What are these grub/efi command that you've listed:
- iommu=pt (I have a vague idea but...)
- kvm_amd.npt=1 && kvm_amd.avic=1 (any intel compatible lines?)
- nmi_watchdog=0
- mitigations=off

I could look easily in the doc, but sometime daya to day needs are more relevant.

Thanks a bunch!
 

androme13

Member
Apr 27, 2020
4
3
8
48
Hi Androme,

If you have free time to explain,
What are these grub/efi command that you've listed:
- iommu=pt (I have a vague idea but...)
- kvm_amd.npt=1 && kvm_amd.avic=1 (any intel compatible lines?)
- nmi_watchdog=0
- mitigations=off

I could look easily in the doc, but sometime daya to day needs are more relevant.

Thanks a bunch!
Hi,
IOMMU=pt is for device virtualizing (like NIC or GPU) with a better passthrough.
kvm_amd.npt=1 && kvm_amd.avic=1 is for AMD specific optimization on KVM. (no proof of gain)
nmi_watchdog=0 is for disable linux watchdog since my system has hardware watchdog
mitigations=off is for disable mitigations (like spectre and others) since i trust my system and it permits to gain horsepower in somes cases. (great gains for I/O)
 
Last edited:
  • Like
Reactions: yjjoe

imadevel

Member
Jun 2, 2014
6
0
21
Hello,

pve-kernel-5.15.35-1-pve/stable,now 5.15.35-3 leads to problems with USB3-Card (2109:3431 VIA Labs, Inc. Hub):

May 19 10:15:57 pve1 kernel: [ 155.980914] DMAR: DRHD: handling fault status reg 2
May 19 10:15:57 pve1 kernel: [ 155.980950] xhci_hcd 0000:06:00.0: WARNING: Host System Error
May 19 10:15:57 pve1 kernel: [ 155.981701] DMAR: [DMA Read NO_PASID] Request device [06:00.0] fault addr 0xfffb2000 [fault reason 0x06] PTE Read access is not set
May 19 10:15:57 pve1 kernel: [ 156.014415] xhci_hcd 0000:06:00.0: Host halt failed, -110
May 19 10:15:57 pve1 kernel: [ 192.655171] xhci_hcd 0000:06:00.0: xHCI host not responding to stop endpoint command.
May 19 10:15:57 pve1 kernel: [ 192.655980] xhci_hcd 0000:06:00.0: USBSTS: 0x0000000c HSE EINT
May 19 10:15:57 pve1 kernel: [ 192.688788] xhci_hcd 0000:06:00.0: Host halt failed, -110
May 19 10:15:57 pve1 kernel: [ 192.689574] xhci_hcd 0000:06:00.0: xHCI host controller not responding, assume dead
May 19 10:15:57 pve1 kernel: [ 192.690379] xhci_hcd 0000:06:00.0: HC died; cleaning up
May 19 10:15:57 pve1 kernel: [ 192.691195] usb 4-1: USB disconnect, device number 2
May 19 10:15:57 pve1 kernel: [ 192.691568] usb 6-1: USB disconnect, device number 2

Kernel 5.13.19-6-pve works fine.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,860
1,039
164
May 19 10:15:57 pve1 kernel: [ 155.980914] DMAR: DRHD: handling fault status reg 2
May 19 10:15:57 pve1 kernel: [ 155.980950] xhci_hcd 0000:06:00.0: WARNING: Host System Error
see the known-issues section of the pve release notes for 7.2:
https://pve.proxmox.com/wiki/Roadmap#7.2-known-issues

please try adding iommu=pt to your kernel cmdline
if this does not help - add instead intel_iommu=off as described there
 
Last edited by a moderator:

0x906

New Member
Jun 13, 2021
1
0
1
A Dell PowerEdge R820 with a PERC H710P Raid Controller and latest BIOS/Firmwares, previously (kernel 5.13.x) working without issues, after upgrading to pve-kernel-5.15.35-1-pve, the logs are filled with errors:

May 22 14:04:19 hephaistos kernel: UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:151:32
May 22 14:04:19 hephaistos kernel: index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
May 22 14:04:19 hephaistos kernel: CPU: 0 PID: 7 Comm: kworker/0:0H Not tainted 5.15.35-1-pve #1
May 22 14:04:19 hephaistos kernel: Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 2.7.0 01/07/2020
May 22 14:04:19 hephaistos kernel: Workqueue: kblockd blk_mq_run_work_fn
May 22 14:04:19 hephaistos kernel: Call Trace:
May 22 14:04:19 hephaistos kernel: <TASK>
May 22 14:04:19 hephaistos kernel: dump_stack_lvl+0x4a/0x5f
May 22 14:04:19 hephaistos kernel: dump_stack+0x10/0x12
May 22 14:04:19 hephaistos kernel: ubsan_epilogue+0x9/0x45
May 22 14:04:19 hephaistos kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49
May 22 14:04:19 hephaistos kernel: ? _printk+0x58/0x6f
May 22 14:04:19 hephaistos kernel: MR_GetPhyParams+0x484/0x700 [megaraid_sas]
May 22 14:04:19 hephaistos kernel: MR_BuildRaidContext+0x662/0xb70 [megaraid_sas]
May 22 14:04:19 hephaistos kernel: megasas_build_and_issue_cmd_fusion+0x106d/0x17e0 [megaraid_sas]
May 22 14:04:19 hephaistos kernel: megasas_queue_command+0x1c2/0x200 [megaraid_sas]
May 22 14:04:19 hephaistos kernel: scsi_queue_rq+0x3dd/0xbe0
May 22 14:04:19 hephaistos kernel: blk_mq_dispatch_rq_list+0x13c/0x800
May 22 14:04:19 hephaistos kernel: ? sbitmap_get+0xb4/0x1e0
May 22 14:04:19 hephaistos kernel: ? sbitmap_get+0x1c1/0x1e0
May 22 14:04:19 hephaistos kernel: blk_mq_do_dispatch_sched+0x2fa/0x340
May 22 14:04:19 hephaistos kernel: ? ttwu_do_wakeup+0x1c/0x170
May 22 14:04:19 hephaistos kernel: __blk_mq_sched_dispatch_requests+0x101/0x150
May 22 14:04:19 hephaistos kernel: blk_mq_sched_dispatch_requests+0x35/0x60
May 22 14:04:19 hephaistos kernel: __blk_mq_run_hw_queue+0x34/0xb0
May 22 14:04:19 hephaistos kernel: blk_mq_run_work_fn+0x1b/0x20
May 22 14:04:19 hephaistos kernel: process_one_work+0x22b/0x3d0
May 22 14:04:19 hephaistos kernel: worker_thread+0x53/0x410
May 22 14:04:19 hephaistos kernel: ? process_one_work+0x3d0/0x3d0
May 22 14:04:19 hephaistos kernel: kthread+0x12a/0x150
May 22 14:04:19 hephaistos kernel: ? set_kthread_struct+0x50/0x50
May 22 14:04:19 hephaistos kernel: ret_from_fork+0x22/0x30
May 22 14:04:19 hephaistos kernel: </TASK>

The server seems operational at this point. I thought that was an issue from the beginning of this year. Is there any workaround or solution aside from downgrading and removing the proxmox repo for the time being?

Thanks!
 

elter

New Member
Apr 19, 2021
4
1
3
34
You are lucky it's working and booting :) Apparently, Ubuntu decided to keep it enabled in 22.04 LTS, which is either an "interesting" choice or it just slipped somehow. You can google "ubuntu 22.04 ubsan issue" and it may be just the begging as it's not in a wild at any measurable scale.

Unless ProxMox team sees much benefit in having it enabled, the simplest workaround is to disable it in the config and release an update.
 

wellbein

Member
May 12, 2020
19
9
8
49
Great, thnaks.
This Kernel version resolve the problem : Servers with AMD CPU won't boot with Kernel 5.13 versions.
Now it boot perfectly.
 

nullify

New Member
Feb 14, 2021
7
1
3
As some others, I also have an issue with GPU Passthrough breaking in Kernel 5.15.35 with an AMD RX580.

/var/log/syslog gets filled with errors like this:
kvm: vfio_region_write( [...] ) failed: Device or resource busy

Interestingly, the VM (windows) seems to start normally but display output is showing nothing but gibberish. It shows the Proxmox boot splash screen and random error messages layered on top.

I assume it has something to do with AMD vendor reset, but haven't had time to look at it closer. For now a rollback to 5.15.12 fixes the issue.
I'm just glad I don't have to go back to 5.13, because that one had TERRIBLE VM performance issues.
 

hawk

New Member
Apr 17, 2022
4
1
3
I don't know if this is a good place to continue this discussion, but as it seems most of the previous discussion of the issues relating to the efifb change is in this thread it might make sense to continue here even though the affected kernels are no longer opt-in.


Just ran into the 5.15 kernel resulting in no display issue after upgrading to a 7.1 system to 7.2:
pve-kernel-5.15.30-2-pve 5.15.30-3
pve-kernel-helper 7.2-2

This is on a machine with an Asrock Rack motherboard with ASPEED graphics.

It appears that "simplefb" is indeed loaded early on, as intended, but that does not result in any display output during early boot (initramfs etc)

[ 11.456186] simple-framebuffer simple-framebuffer.0: framebuffer at 0xfb000000, 0x1d5000 bytes
[ 11.456191] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=800x600x32, linelength=3200
[ 11.456256] Console: switching to colour frame buffer device 100x37
[ 11.485771] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered!


However later in the boot process "ast" is loaded and display is restored (I assume that is what does it, anyway).

[ 30.872558] ast 0000:29:00.0: [drm] Using P2A bridge for configuration
[ 30.890569] ast 0000:29:00.0: [drm] AST 2500 detected
[ 30.908147] ast 0000:29:00.0: [drm] Analog VGA only
[ 30.925632] ast 0000:29:00.0: [drm] dram MCLK=800 Mhz type=7 bus_width=16
[ 30.943543] [drm] Initialized ast 0.1.0 20120228 for 0000:29:00.0 on minor 0

This is as it appears over IPMI/ipkvm, I have no idea what it looks like on the local console (that is rather pointless, anyway).
I do know, however, that with the 5.13 / efifb setup this problem did not occur.

I hope there is some way to get this sorted.
I suppose one could get ast (and I guess it's firmware?) into initramfs, but that seems a bit messy, and I assume the intention is that simplefb should be able to work anywhere?
For what it's worth, the problem remains with:
pve-kernel-5.15.35-1-pve 5.15.35-3
pve-kernel-helper 7.2-3


Is there a way to somehow revert to an efifb-based setup as that seemed to Just Work™, while I have no idea if there even is a way to make simplefb work?
Otherwise, what is a possible way forward?

Problem remains with:
pve-kernel-5.15.35-2-pve 5.15.35-4
pve-kernel-helper 7.2-4

Any ideas?
I do not particularly care what exactly the console setup is, I just want it to be in a proper usable state also during boot, which has not been the case since the move to 5.15.x and simplefb.
(Old school 80x25 text mode would be perfectly fine by me, over this framebuffer nonsense, if that was an option.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!