pve 8.0 and 8.1 hangs on boot

mir

Famous Member
Apr 14, 2012
3,568
127
133
Copenhagen, Denmark
Hi all,
Serious problem with proxmox-kernel-6.2 (proxmox-kernel-6.2.16-19-pve) and proxmox-kernel-6.5 (proxmox-kernel-6.5.11-4-pve-signed) previous kernels proxmox-kernel-6.1 and the one I currently boots on pve-kernel-5.15.126-1-pve works as expected. The symptom is that when boot sequence reaches 'looding initrd' the system hangs forever and only a power reset can bring it back to live again.

CPU: 8 x AMD Opteron(tm) Processor 3365 (1 Socket)
MB:
System Information
Manufacturer: Supermicro
Product Name: H8SML
Version: 1234567890
Disk layout:
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 20973567 20971520 10G 83 Linux
/dev/sda2 20973568 175836527 154862960 73.8G 5 Extended
/dev/sda5 20975616 175836527 154860912 73.8G 8e Linux LVM

pvdisplay
--- Physical volume ---
PV Name /dev/sdb
VG Name qnap
PV Size 100.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 25599
Free PE 24319
Allocated PE 1280
PV UUID arBRsL-OIXx-TKAG-XCeV-D1rS-EvBt-oDvnAh

--- Physical volume ---
PV Name /dev/sda5
VG Name pve
PV Size 73.84 GiB / not usable <3.68 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 18903
Free PE 9315
Allocated PE 9588
PV UUID GPhzmg-P8Br-1fLL-L57X-TOOB-IS9P-wUBo4Y

Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
Disk model: iSCSI Storage

Should I perhaps exclude lvm from scanning /dev/sdb?
 
Forgot:
$ sudo pveversion -v
[sudo] password for mir:
proxmox-ve: 8.1.0 (running kernel: 5.15.126-1-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
pve-kernel-5.15: 7.4-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
pve-kernel-5.15.126-1-pve: 5.15.126-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3
 
Hi,

We got some reports that sound similar where passing nomodeset helped to make to get the GPU (from e.g. the IPMI/iKVM out-of-band management) and the newer kernel play along again. Could you please try that?
Possibly also removing the quiet option and add the earlyprintk=[I]vga[/I] one to increase the chance to get some useful information out that can help us debug this.

hangs forever and only a power reset can bring it back to live again.
Is networking never coming up either?
 
Hi,

We got some reports that sound similar where passing nomodeset helped to make to get the GPU (from e.g. the IPMI/iKVM out-of-band management) and the newer kernel play along again. Could you please try that?
nomodeset was enabled.
Possibly also removing the quiet option and add the earlyprintk=[I]vga[/I] one to increase the chance to get some useful information out that can help us debug this.
This is wat I see after adding earlyprintk=vga which is exactly the same as I saw before, no output what so ever
2023-11-24_19-39.png
To me it looks like a kernel segfault .

Is networking never coming up either?
networking is enabled (management interface:
$ ping -c1 esx2
PING esx2.datanom.net (172.16.3.9) 56(84) bytes of data.
64 bytes from esx2.datanom.net (172.16.3.9): icmp_seq=1 ttl=64 time=0.342 ms

--- esx2.datanom.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.342/0.342/0.342/0.000 ms
 
Seems it is the initrd build system on 8.0 and 8.1 which is making unbootable initrd files because if I build a initrd file for pve-kernel-5.15.126-1-pve: 5.15.126-1 on 8.1 the system cannot boot using the pve-kernel-5.15.126-1-pve: 5.15.126-1 an showing exactly the same as for kernel > 6.1 which kind of points the finger at the initrd build system provided by 8.0 and 8.1
 
Thanks for your feedback!
networking is enabled (management interface:
$ ping -c1 esx2
PING esx2.datanom.net (172.16.3.9) 56(84) bytes of data.
64 bytes from esx2.datanom.net (172.16.3.9): icmp_seq=1 ttl=64 time=0.342 ms
So the server is fully up, but fails to initialize the console correctly suggesting a hang?

Seems it is the initrd build system on 8.0 and 8.1 which is making unbootable initrd files because if I build a initrd file for pve-kernel-5.15.126-1-pve: 5.15.126-1 on 8.1 the system cannot boot using the pve-kernel-5.15.126-1-pve: 5.15.126-1 an showing exactly the same as for kernel > 6.1 which kind of points the finger at the initrd build system provided by 8.0 and 8.1
That would then point to the removal of simplefb from initramfs:
https://forum.proxmox.com/threads/o...st-no-subscription.135635/page-10#post-608958

Actually I might have some idea why that could be the cause of it, maybe you could try re-adding that module and see if that fixes the issue? That would help in confirming that theory (until I find a host here that shows the same issue).
 
  • Like
Reactions: deviantintegral
Thanks for your feedback!

So the server is fully up, but fails to initialize the console correctly suggesting a hang?


That would then point to the removal of simplefb from initramfs:
https://forum.proxmox.com/threads/o...st-no-subscription.135635/page-10#post-608958

Actually I might have some idea why that could be the cause of it, maybe you could try re-adding that module and see if that fixes the issue? That would help in confirming that theory (until I find a host here that shows the same issue).
Yep, that did the trick :)
The fix enables booting on these kernels:
- 5.15.126-1-pve
- 6.2.16-19-pve
- 6.5.11-4-pve
Thank's Thomas :cool:

To recap in case other comes to this thread:
echo "simplefb" >> /etc/initramfs-tools/modules
update-initramfs -u -k 6.5.11-4-pve (or update-initramfs -u -k all)
# reboot

My configuration in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset amd_iommu=on iommu=pt"
 
I have exactly the same problem. By applying proxmox-kernel-helper 8.0.4 or by carrying out the manipulation summarized by @mir everything works and I get the login prompt on my IPMI.
 
Encountering same issue here with lack of initrd console on few hardware configurations after upgrade from 6.2.16-7 to 6.5.11-4:
- PowerEdge R320; the console does gets initialized after that, but had to enter decryption passphrase blindly
- HP thinClient t630 with AMD GX-420GI; the console does not gets initialized even post-initrd
- Supermicro X11SSH-F with E3-1275 v6; similarly to R320 the console gets initialized post-initrd but had to enter decryption passphrase blindly

All 3 machines work fine after simplefb was added and aside from these issues, work fine and had no issues taking keypresses. Given only one machines out of 3 failed to initialize console post-initrd I suspect there might be more of these in the wild but people simply do not notice. All 3 machines were already running with nomodeset even before upgrade.

---

UPDATE: It would seem that console post-initrd gets initialized only if system is booted without nomodeset. This was part of my troubleshooting step on first two machines (the 3rd one - t630 didn't receive it) but I did not associate it initially with this difference until now.

Regardless of whether nomodeset is present or not, the initrd console does not get initialized leading to no visible prompt for entering decryption passphrase for cryptsetup.
 
Last edited:
Oh, it would be still great if you could attach the output of dmesg of a boot, from now working would be already great, but maybe even one with the module not in the initramfs, albeit it's naturally a hassle to go back.

E.g., use something like:
dmesg | xz -9 > "dmesg-$(date -I).log.xz"
 
Oh, it would be still great if you could attach the output of dmesg of a boot, from now working would be already great, but maybe even one with the module not in the initramfs, albeit it's naturally a hassle to go back.

E.g., use something like:
dmesg | xz -9 > "dmesg-$(date -I).log.xz"
Hello, I am sharing the dmesg logs with you as an attachment.
 

Attachments

  • dmesg-2023-11-28.log.xz
    18.3 KB · Views: 5
Thanks!

So seems like the nomodeset causes skipping of the HW-specific drivers:
Code:
[    3.685255] Driver imsttfb not loading because of nomodeset parameter
[    3.691692] Driver asiliantfb not loading because of nomodeset parameter

And then only the framebuffer from the EFI console is left, which cannot be used without simplefb, as only that (and simpledrm) can take over framebuffers from the firmware.

Do you know by chance when and why you added the nomodeset parameter, and if it wasn't recently, did you also try to remove it to see if the original issues are fixed?

As GFX-boot is still really a bit of a mess, some HW just must have modesetting, and some cannot stand it, and others is seemingly flip flopping every other release...
Anyway, for now I reverted to adding the simplefb module to the initramfs by default, as it seems it's really worth any of the small hassle it brings, as having no console makes any admin life definitively not easier...
 
  • Like
Reactions: stefan00
Attaching files from R320 as I believe this is the most typical configuration, unlike other two which are more custom-like.

There's also an update to my previous post - it would seem that console post-initrd gets initialized only if system is booted without nomodeset. This was part of my troubleshooting step on first two machines (the 3rd one - t630 didn't receive it) but I did not associate it initially with this difference until now.

Regardless of whether nomodeset is present or not, the initrd console does not get initialized leading to no visible prompt for entering decryption passphrase for cryptsetup.

I also verified that serial console (console=ttyS0,115200n8) does gets initialized properly.
 

Attachments

  • dmesg-nofb-ako-2023-11-28.log.xz
    20.5 KB · Views: 5
  • dmesg-simplefb-ako-2023-11-28.log.xz
    20.8 KB · Views: 3
The nomodeset parameter is present in the installation template offered by OVH.
If I remove this setting from /etc/default/grub and run the command update-grub && reboot. I no longer encounter any malfunctions and I have the login prompt again.

However, I encounter errors, is it the deletion of nomodeset that causes this?
EDID has corrupt header
EDID block 0 is all zeroes
EDID has corrupt header
EDID block 0 is all zeroes
 

Attachments

  • Screenshot_20231128_180526.png
    Screenshot_20231128_180526.png
    109.3 KB · Views: 32
Last edited:
The nomodeset parameter is present in the installation template offered by OVH.
If I remove this setting from /etc/default/grub and run the command update-grub && reboot. I no longer encounter any malfunctions and I have the login prompt again.
Ah ok, good to know.

However, I encounter errors, is it the deletion of nomodeset that causes this?
EDID has corrupt header
EDID block 0 is all zeroes
EDID has corrupt header
EDID block 0 is all zeroes
EDID is display info stuff, and FWIW, I got some HW that also reports this (IIRC because I run it headless and the vendor/firmware just cannot believe anybody would do so and wants a display connected), it's annoying, but I just ignore it there because it's a cheap mini server that I do not expect too much from. But that it happens on server HW does strike me as slightly odd..
 
EDID is display info stuff, and FWIW, I got some HW that also reports this (IIRC because I run it headless and the vendor/firmware just cannot believe anybody would do so and wants a display connected), it's annoying, but I just ignore it there because it's a cheap mini server that I do not expect too much from. But that it happens on server HW does strike me as slightly odd..
maybe they apply the nomodeset parameter to counter this ?
 
Confirmed the problem here on a Supermicro X13sae-f. Fresh installation, output hangs on "Loading initial ramdisk". IPMI (BMC access) did not work either.

Only adding simplefb to /etc/initramfs-tools/modules fixed it.

Not working (without simplefb):

- nomodeset
- quiet

Since I did a very customized installation, this problem might not apply to many people running stock installations. However, console access should always be possible in some way during the early boot phase.

For example in my special case, initrd could not import rpool. Having no console access / output really makes problems like this very hard to diagnose / fix. Thanks to this thread I booted into another system, chrooted into pve root and regenerated the initramfs. Worked, but was a shot in the dark.

Thanks for porting it back :)
 
I think I'm a bit lost
In our case:
On Dell R240 or R340. Boot from JBoss RAID1 LVM 6.5.11-4-pve UEFI
The server stops at initial ramdisk. It doesn't get up
Same machines on 6.2.16-19-pve no problem

/etc/default/grub:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

/etc/initramfs-tools/modules
Code:
simplefb

Proxmox & bios latest possible stable version

UPDATE:
On 6.5.11-6 same problem
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!