Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

thanhpi · Feb 7, 2022

I'm not able to get past "Loading initial ramdisk" on 5.15.17 or .19 (I have tried them both on a clean install of proxmox, no other output than "Loading initial ramdisk" even when removing quiet from the grub, using ext4), I saw some people mention having better success with 5.15.12, is there any way for me to download it even though it's not the latest release.

I can still boot into 5.13.4 and .2 but would like to use 5.15 for Alder Lake iGPU compatibility

t.lamprecht · Feb 7, 2022

thanhpi said:
I'm not able to get past "Loading initial ramdisk" on 5.15.17 or .19 (I have tried them both on a clean install of proxmox, no other output than "Loading initial ramdisk" even when removing quiet from the grub, using ext4), I saw some people mention having better success with 5.15.12, is there any way for me to download it even though it's not the latest release.

Are you booting UEFI? What HW is in use in general (CPU, mainboard, GPU, ..).

Also, if the system can normally boot unattended (i.e., no full disk encryption that needs entering the password or the like), does the PVE host finish booting? Because I think that some people having an issue with the newer 5.15.17+ kernel are due to the SYS_FB kernel build config change, so it's not failing the whole system but causing the console to not show any messages/output at all while still booting fine - not saying this has to be the case for all, but at least one user had those symptoms so..

thanhpi · Feb 7, 2022

t.lamprecht said:
Are you booting UEFI? What HW is in use in general (CPU, mainboard, GPU, ..).

Also, if the system can normally boot unattended (i.e., no full disk encryption that needs entering the password or the like), does the PVE host finish booting? Because I think that some people having an issue with the newer 5.15.17+ kernel are due to the SYS_FB kernel build config change, so it's not failing the whole system but causing the console to not show any messages/output at all while still booting fine - not saying this has to be the case for all, but at least one user had those symptoms so..

cpu Intel i5 12500 (UHD 770 iGPU)
mobo asrock z690 m/itx ax (problem was before updating bios and remains after updating bios) (VT-d, and virtualization enabled)
ram 2x8 ddr4 3600mhz, (have also tried turning off xmp)
Sorry what does booting UEFI mean?

I did scour the whole 5.15 thread and did not feel like my problem associated with anyone else. I have not been able to access the GUI even when leaving it stuck on "Loading initial ramdisk" for 20-30 minutes or so, so I assume I have another problem.

edit: I did try apt install pve-kernel-5.15.12-1-pve which seems to work for me.

miklos_akos · Feb 7, 2022

PVE kernel 5.15.17-1 doesn't let me unbind my gpu from the kernel efi framebuffer and thus trying to pass the boot gpu onto a VM will result in an Error 43 (Nvidia GeForce 1050 Ti).
My dmesg is filled with this:

Code:

[ 2016.171074] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2016.171114] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2016.171196] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2016.171205] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2016.171214] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2016.171223] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460547] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460594] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460686] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460697] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460707] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]
[ 2513.460717] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xc0000000-0xc1ffffff 64bit pref]

Worked fine on kernel 5.13.
early boot with kernel 5.13: https://paste.c-net.org/ThrownWhatcha
cmdlines with kernel 5.13: https://paste.c-net.org/SleepingSedate
early boot with kernel 5.15.17: https://paste.c-net.org/OrchidsSorority
cmdlines with kernel 5.15.17: https://paste.c-net.org/KidneysFreight
5.15.17 kernel config: https://paste.c-net.org/GrateWalker
5.13 kernel config: https://paste.c-net.org/MaclarenTidings
I got pointed towards CONFIG_SYSFB_SIMPLEFB=y being the problem
On kernel 5.15.7-1 this is not set and passing through my boot gpu works as expected: https://paste.c-net.org/PrizedMornin
On kernel 5.15.17 I can boot windows 11 with the hypervisor features enabled (necessary for Windows Subsystem for Android). Both on kernel 5.13 and 5.15.7 my vm either blue screens with hypervisor_error or just freezes at boot.
System specs:
Mobo: ASRock B550 Pro4
CPU: AMD Ryzen 5 3500
RAM: G.SKILL AEGIS 2x8GB 3200MHz DDR4
GPU1: NVidia GeForce 1050 Ti [ASUS Cerberus GTX 1050 Ti OC 4GB]
GPU2 (used for a different vm): ATi Radeon HD5450 1GB

mishki · Feb 8, 2022

so it's not failing the whole system but causing the console to not show any messages/output at all while still booting fine

The console freezes, but there are no problems via ssh and functioning of pve/cluster/virtual machines and other.
I don't see any other errors. 3 different servers. Btrfs on hosts.

My non-passing through Efi stub:
I don't see any other errors. 3 different servers. Btrfs on host.

Bash:

[    0.000000] Linux version 5.15.19-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.19-1 (Fri, 04 Feb 2022 06:09:14 +0100) ()
[    0.000000] Command line: initrd=\EFI\proxmox\5.15.19-1-pve\initrd.img-5.15.19-1-pve root=UUID=792e741e-4ac4-416a-aac8-123dd61b9e01
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai
.....

Bash:

 3.300473] ================================================================================
[    3.300475] ================================================================================
[    3.300476] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:151:32
[    3.300478] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
[    3.300479] CPU: 0 PID: 564 Comm: kworker/0:1H Not tainted 5.15.19-1-pve #1
[    3.300481] Hardware name: FUJITSU PRIMERGY RX2540 M5/D3384-B1, BIOS V5.0.0.14 R1.30.0 for D3384-B1x                    12/17/2021
[    3.300483] Workqueue: kblockd blk_mq_run_work_fn
[    3.300485] Call Trace:
[    3.300486]  <TASK>
[    3.300486]  dump_stack_lvl+0x4a/0x5f
[    3.300489]  dump_stack+0x10/0x12
[    3.300491]  ubsan_epilogue+0x9/0x45
[    3.300493]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[    3.300495]  ? _printk+0x58/0x6f
[    3.300498]  MR_GetPhyParams+0x484/0x700 [megaraid_sas]
[    3.300503]  MR_BuildRaidContext+0x3bb/0xb70 [megaraid_sas]
[    3.300508]  megasas_build_and_issue_cmd_fusion+0x106d/0x17e0 [megaraid_sas]
[    3.300513]  megasas_queue_command+0x1bf/0x200 [megaraid_sas]
[    3.300517]  scsi_queue_rq+0x3da/0xbe0
[    3.300519]  blk_mq_dispatch_rq_list+0x139/0x800
[    3.300522]  ? sbitmap_get+0xb4/0x1e0
[    3.300524]  ? sbitmap_get+0x131/0x1e0
[    3.300526]  __blk_mq_do_dispatch_sched+0xba/0x2d0
[    3.300528]  ? finish_task_switch.isra.0+0xa6/0x2a0
[    3.300531]  __blk_mq_sched_dispatch_requests+0x104/0x150
[    3.300534]  blk_mq_sched_dispatch_requests+0x35/0x60
[    3.300536]  __blk_mq_run_hw_queue+0x34/0x70
[    3.300538]  blk_mq_run_work_fn+0x1b/0x20
[    3.300540]  process_one_work+0x228/0x3d0
[    3.300542]  worker_thread+0x53/0x410
[    3.300544]  ? process_one_work+0x3d0/0x3d0
[    3.300546]  kthread+0x127/0x150
[    3.300548]  ? set_kthread_struct+0x50/0x50
[    3.300551]  ret_from_fork+0x1f/0x30
[    3.300554]  </TASK>
[    3.300555] ================================================================================

t.lamprecht · Feb 8, 2022

mishki said:
The console freezes, but there are no problems via ssh and functioning of pve/cluster/virtual machines and other.

Thanks for confirming that part.

t.lamprecht · Feb 8, 2022

miklos_akos said:
PVE kernel 5.15.17-1 doesn't let me unbind my gpu from the kernel efi framebuffer and thus trying to pass the boot gpu onto a VM will result in an Error 43 (Nvidia GeForce 1050 Ti).

avw said:
This is almost what happened to me, except that the system did continue to boot without displaying anything and the GUI was reachable and all but one VM did start.

We investigated a bit more closely into simplefb and nvidia drivers, could now reproduce such symptoms on one of our test system and found a possible candidate fixing this, mainly the first patch of the following series:
https://patchwork.kernel.org/project/dri-devel/patch/20220125091222.21457-2-tzimmermann@suse.de/
But the second one (2/5) seems additionally promising, currently building a 5.15 kernel with those backported here, if it works out I'll upload it to test.

leesteken · Feb 8, 2022

This suggestion for a very similar problem appears to have helped at least one person that did no longer have a console to enter the password for an encrypted ZFS.

EDIT 1: It worked for boot messages/host console for me, with 5.15.19-1-pve. However, I cannot get the simplefb to unbind the framebuffer like I could with efifb, which breaks passthrough of the same AMD GPU because of BAR 0: can't reserve [mem .....

EDIT 2: Not using any framebuffer (video=simplefb:off video=efifb:off video=vesafb:off and letting amdgpu load normally, gives me most boot messages and a host console. Unbinding the consoles and unloading amdgpu before passthrough fixes all passthrough problems for me: echo 0 | tee /sys/class/vtconsole/vtcon*/bind; sleep 3; rmmod amdgpu.

EDIT 3: amdgpu and vfio-pci work together nicely again, and fixes my passthrough problems since 5.11.22-5-pve. Thanks Proxmox developers!

t.lamprecht · Feb 8, 2022

avw said:
This suggestion for a very similar problem appears to have helped at least one person that did no longer have a console to enter the password for an encrypted ZFS.

Yeah, it's the result of our testing today with lots of slow reboots that @dcsapak had to wait on.
The fix for the allowing the memory regions to be reused again is required on some additional systems.

For now the workaround for most systems is just adding simplefb to get included in the initrd:

Bash:

echo 'simplefb' >>/etc/initramfs-tools/modules
update-initramfs -k all -u
proxmox-boot-tool refresh
systemctl reboot

We'll look into about what way we'll go to integrate this out of the box before switching to 5.15 as default.

mishki · Feb 8, 2022

t.lamprecht said:
Yeah, it's the result of our testing today with lots of slow reboots that @dcsapak had to wait on.
The fix for the allowing the memory regions to be reused again is required on some additional systems.

For now the workaround for most systems is just adding simplefb to get included in the initrd:

Bash:

echo 'simplefb' >>/etc/initramfs-tools/modules update-intramfs -k all -u proxmox-boot-tool refresh systemctl reboot

We'll look into about what way we'll go to integrate this out of the box before switching to 5.15 as default.

Yes, everything is great. Now you do not need to boot into the old kernel to fix something from the console.

there was another error in dmesg that I didn't see right away:

Code:

[drm] *ERROR* can't reserve VRAM

mishki · Feb 9, 2022

was not attentive
pve starts lightly, but console freezes on these lines

Bash:

[   12.862026] fb0: switching to mgag200 from simple
[   12.867860] mgag200 0000:02:00.0: vgaarb: deactivate vga console
[   12.868344] mgag200 0000:02:00.0: [drm] *ERROR* can't reserve VRAM

ThinkSystem SR630
&
PRIMERGY RX2540 M5

t.lamprecht · Feb 9, 2022

mishki said:
[ 12.868344] mgag200 0000:02:00.0: [drm] *ERROR* can't reserve VRAM

FWICT that system could be affected by the simplefb related bug of memory region release:

t.lamprecht said:
https://patchwork.kernel.org/project/dri-devel/patch/20220125091222.21457-2-tzimmermann@suse.de/
But the second one (2/5) seems additionally promising, currently building a 5.15 kernel with those backported here, if it works out I'll upload it to test.

FYI: We'll upload a kernel with those fixes plus including the simplefb module in the initramfs (at least per default) probably today or tomorrow.

SlothCroissant · Feb 11, 2022

Hey there! Testing 5.15, on my Lenovo System X 3850x6 (4x E7-8860v3, 512GB RAM).

Fresh install of 7.1 to a pair of RAID 1 ZFS SSDs, switched to the free repos, then installed `pve-kernel-5.15` (which results in pve-kernel-5.15.19-1-pve), then rebooted. Hangs immediately at: "EFI stub: loaded initrd from command line option"

Rebooting and falling back to 5.13 works fine. Any thoughts? Happy to try anything, this server isn't prod at the moment.

mcdull · Feb 11, 2022

The latest 5.15.19-1-pve breaks the nvidia pci passthrough. It runs all fine with 5.15.7-1.
with 5.15.19-1-pve, the screen will show only noise when vm start, and the vm will then freeze.

Hardware: Asrosk X470 Taichi, Ryzen 3900x, nvidia GTX1080, 128GB memory.

There is not much I can do to debug as the vm is my major desktop. Revert to 5.15.7-1 all good.

t.lamprecht · Feb 11, 2022

SlothCroissant said:
Any thoughts? Happy to try anything, this ser

Yes, see:

t.lamprecht said:
For now the workaround for most systems is just adding simplefb to get included in the initrd:

Bash:

echo 'simplefb' >>/etc/initramfs-tools/modules update-initramfs -k all -u proxmox-boot-tool refresh systemctl reboot

We'll look into about what way we'll go to integrate this out of the box before switching to 5.15 as default

CaseyH · Feb 15, 2022

I have a minis-forum HM80 mini-pc (AMD Ryzen 4800U). With a new PVE 7.1 install networking wouldn't work automatically and host would hang on reboot. Upgraded kernel to 5.15.19-1 and both of those issues were resolved, however I ran into another issue.

The HM80 has two SATA ports; prior to upgrading to 5.15.19-1 (and in other testing with Win10/ESXi/XCP-NG) the disks attached to those ports worked fine and performance was around 480 MB writes. After upgrading the kernel to 5.15.19-1, I noticed performance on one of the drives dropped to 220 MB writes. At the same time observed "ATA Bus Error" messages on the host console. At first I thought it was a hardware issue. I tried replacing the SATA cables and the drives but everything checked out ok. The stranger thing is if I only have 1 Drive connected everything works fine, in either SATA port. When I connect a second drive and go to use it the errors start.

Not sure what's going on but troubleshooting efforts point to the issue being correlated to the upgrade to 5.15.19-1 and using both SATA ports at the same time.

Neobin · Feb 15, 2022

t.lamprecht said:
FYI: We'll upload a kernel with those fixes plus including the simplefb module in the initramfs (at least per default) probably today or tomorrow.

Is there any guess, when we roughly can expect it on the no-subscription repo?

t.lamprecht · Feb 15, 2022

Neobin said:
Is there any guess, when we roughly can expect it on the no-subscription repo?

Depends on the feedback, it's available on pvetest repo as of now if you want to test it already.

https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

mishki · Feb 15, 2022

The console freezes, but there are no problems via ssh and functioning of pve/cluster/virtual machines and other.
I don't see any other errors. 3 different servers. Btrfs on hosts.

OS: Proxmox VE 7.1-10 x86_64
Kernel: 5.15.19-2-pve

Everything is fine after upgrade. Thnx.

Code:

ThinkSystem SR630
PRIMERGY RX2540 M5
Lenovo System x3550 M5

joed · Feb 16, 2022

t.lamprecht said:

Thanks, this has restored my console output on both 5.15.17-1-pve and 5.15.19-1-pve.

I do still have issues with the host node hanging with i915 passthrough since upgrading to PVE 7.1 (it was stable on 6.x and 7.0), though I wouldn't expect the hangs to be related to simplefb. The host node hang occurs when the guest VM "resets the chip" following a GPU hang (the following is from dmesg -Tw on the VM at the time of hang, nothing on the host node - it's an immediate hang, no logs, etc.):

Code:

[Sat Feb  5 03:42:34 2022] i915 0000:00:10.0: GPU HANG: ecode 9:1:0x8ed9fff2, in Plex Transcoder [17895], hang on rcs0
[Sat Feb  5 03:42:34 2022] i915 0000:00:10.0: Resetting rcs0 for hang on rcs0
[Sat Feb  5 03:42:34 2022] i915 0000:00:10.0: Resetting chip for hang on rcs0

The guest VM, although it works, doesn't seem 100% happy with the GPU. The following is from the guest VM:

Code:

# dmesg -T  | egrep 'i915|drm|01:00'
[Wed Feb 16 19:19:07 2022] pci 0000:01:00.0: [8086:9bc5] type 00 class 0x038000
[Wed Feb 16 19:19:07 2022] pci 0000:01:00.0: reg 0x10: [mem 0xc1000000-0xc1ffffff 64bit]
[Wed Feb 16 19:19:07 2022] pci 0000:01:00.0: reg 0x18: [mem 0x800000000-0x80fffffff 64bit pref]
[Wed Feb 16 19:19:07 2022] pci 0000:01:00.0: reg 0x20: [io  0xa000-0xa03f]
[Wed Feb 16 19:19:08 2022] bochs-drm 0000:00:01.0: remove_conflicting_pci_framebuffers: bar 0: 0xc0000000 -> 0xc0ffffff
[Wed Feb 16 19:19:08 2022] bochs-drm 0000:00:01.0: remove_conflicting_pci_framebuffers: bar 2: 0xc294b000 -> 0xc294bfff
[Wed Feb 16 19:19:08 2022] fb0: switching to bochsdrmfb from EFI VGA
[Wed Feb 16 19:19:08 2022] bochs-drm 0000:00:01.0: vgaarb: deactivate vga console
[Wed Feb 16 19:19:08 2022] [drm] Found bochs VGA, ID 0xb0c5.
[Wed Feb 16 19:19:08 2022] [drm] Framebuffer size 16384 kB @ 0xc0000000, mmio @ 0xc294b000.
[Wed Feb 16 19:19:08 2022] [drm] Found EDID data blob.
[Wed Feb 16 19:19:08 2022] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:01.0 on minor 0
[Wed Feb 16 19:19:08 2022] fbcon: bochs-drmdrmfb (fb0) is primary device
[Wed Feb 16 19:19:08 2022] i915 0000:01:00.0: VT-d active for gfx access
[Wed Feb 16 19:19:08 2022] bochs-drm 0000:00:01.0: fb0: bochs-drmdrmfb frame buffer device
[Wed Feb 16 19:19:08 2022] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[Wed Feb 16 19:19:08 2022] [drm] Driver supports precise vblank timestamp query.
[Wed Feb 16 19:19:08 2022] i915 0000:01:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[Wed Feb 16 19:19:08 2022] [drm] Failed to find VBIOS tables (VBT)
[Wed Feb 16 19:19:08 2022] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[Wed Feb 16 19:19:09 2022] [drm] failed to retrieve link info, disabling eDP
[Wed Feb 16 19:19:09 2022] i915 0000:01:00.0: Failed to program MOCS registers; expect performance issues.
[Wed Feb 16 19:19:09 2022] [drm] Initialized i915 1.6.0 20190822 for 0000:01:00.0 on minor 1
[Wed Feb 16 19:19:09 2022] [drm] Cannot find any crtc or sizes
[Wed Feb 16 19:19:09 2022] [drm] Cannot find any crtc or sizes
[Wed Feb 16 19:19:10 2022] [drm] Cannot find any crtc or sizes
[Wed Feb 16 19:19:11 2022] systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.

I've got some nVidia GPUs waiting to be installed this weekend to see if I fair better there than with the iGPU.

Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

Member

Proxmox Staff Member

Member

Member

Well-Known Member

Proxmox Staff Member

Proxmox Staff Member

Distinguished Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Attachments

Proxmox Staff Member

Active Member

Active Member

Proxmox Staff Member

New Member

Attachments

Distinguished Member

Proxmox Staff Member

Well-Known Member

Member

We value your privacy