pve 8.0 and 8.1 hangs on boot

I think I'm a bit lost
In our case:
On Dell R240 or R340. Boot from JBoss RAID1 LVM 6.5.11-4-pve UEFI
The server stops at initial ramdisk. It doesn't get up
Same machines on 6.2.16-19-pve no problem

/etc/default/grub:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

/etc/initramfs-tools/modules
Code:
simplefb

Proxmox & bios latest possible stable version
Hi,

Same here with a R240

Tobias
 
  • Like
Reactions: Pakillo77
nomodeset was enabled.

This is wat I see after adding earlyprintk=vga which is exactly the same as I saw before, no output what so ever
View attachment 58761
To me it looks like a kernel segfault .


networking is enabled (management interface:
$ ping -c1 esx2
PING esx2.datanom.net (172.16.3.9) 56(84) bytes of data.
64 bytes from esx2.datanom.net (172.16.3.9): icmp_seq=1 ttl=64 time=0.342 ms

--- esx2.datanom.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.342/0.342/0.342/0.000 ms

Not that it matters, but I was able to install 7.4 without any problems onto this hardware (below)!

Previously,

Having this exact issue!
ASRock B650D4U-2L2T/BCM
AMD Ryzen 9 7900X
32BG ECC DDR5
 
Last edited:
Last edited:
I think I'm a bit lost
In our case:
On Dell R240 or R340. Boot from JBoss RAID1 LVM 6.5.11-4-pve UEFI
The server stops at initial ramdisk. It doesn't get up
Same machines on 6.2.16-19-pve no problem

/etc/default/grub:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

/etc/initramfs-tools/modules
Code:
simplefb

Proxmox & bios latest possible stable version

UPDATE:
On 6.5.11-6 same problem
Hi,

Yes, indeed. I can confirm: Same problem as before - system is unusable with kernel 6.5 unfortunately.

Tobias
 
  • Like
Reactions: zhivko
*any* Update on this? It's quite annoing to be honest...
Using the 6.2 kernel if it's not affected for such specific HW issues, it still gets updates.

Please also note that this is the community forum, while we look into every issue eventually, the ones we get in from enterprise support have naturally priority, and as we sadly have no such broken HW in our test lab it's a bit hard to bisect the issue fast.
We'll still look into it, but it might need a bit more time.
 
EDID is display info stuff, and FWIW, I got some HW that also reports this (IIRC because I run it headless and the vendor/firmware just cannot believe anybody would do so and wants a display connected), it's annoying, but I just ignore it there because it's a cheap mini server that I do not expect too much from. But that it happens on server HW does strike me as slightly odd.
You never know what crazy bugs you find in a system BIOS. I am not at all surprised that some hardware misbehaves if you don't connect a monitor. Fortunately, dummy monitor plugs cost only around $5. That's a pragmatic solution to this type of bug.

Plug one of those like gizmos into your computer, and it now believes you have a monitor connected. No more EDID errors.

Of course, in many cases, EDID errors are merely cosmetic. So, it would necessarily address the hangs that people have reported in this thread. But even cosmetic issues are worth addressing, as they can be confusing and hide real problems.
 
I think I'm a bit lost
In our case:
On Dell R240 or R340. Boot from JBoss RAID1 LVM 6.5.11-4-pve UEFI
The server stops at initial ramdisk. It doesn't get up
Same machines on 6.2.16-19-pve no problem

/etc/default/grub:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

/etc/initramfs-tools/modules
Code:
simplefb

Proxmox & bios latest possible stable version

UPDATE:
On 6.5.11-6 same problem

I am on an R340 as well with the same issue. 6.5.11+ has issues and I haven't been able to find out why. using debug just shows it stopping lvm for some reason.
 
Whenever you suspect a problem with the initramfs, it's a good idea to try booting with the "break=top" kernel command line option. If the initramfs was loaded at all, then this should give you a shell prompt. And that would be a useful additional data point.

Of course, you should still pass whatever command line option were necessary to even enable console output. That can be a little tricky to sort out and depends on the hardware in your computer. If in doubt, a serial console can make all the difference.
 
Thanks for your feedback!

So the server is fully up, but fails to initialize the console correctly suggesting a hang?


That would then point to the removal of simplefb from initramfs:
https://forum.proxmox.com/threads/o...st-no-subscription.135635/page-10#post-608958

Actually I might have some idea why that could be the cause of it, maybe you could try re-adding that module and see if that fixes the issue? That would help in confirming that theory (until I find a host here that shows the same issue).
I have had lot of trouble today
3x Dell T140 doesn't come up. They only work with 6.2, but it was a mess to solve the customers at the same time.

2x Dell T30 does not come up as well.
One of them suddenly works after some reboots and now works fine with 6.5.11-6

The other one only works after rolling back the rpool/ROOT/PVE-1 partition. Otherwise is has no network connection (ifconfig shows only lo1 as the only network interface)
If i roll back the old snapshot it boots with both kernel and both with working network, but the 6.5 is missing (of course) some modules which prevent KVM from working. So this is NOT a kernel thing.
 
Last edited:
Tested several installations and/ or upgrades from 7.4 to 8.0/ 8.1.
(All on older Dell R720 with H310 mini in IT Mode with boot on mirrored zfs and internal nic daughtercard 2x 10Gbit/2x 1Gbit.)

No successes at all booting an upgraded system (7.4 to 8.x), booting to rescue shell was possible in any case at any time.
- all upgrades from 7.4 hung on bringing up network
- downgrading pve-firmware to 3.8.5 were not successful
- simplefd in grub.conf doesn't work for me
- using old kernel 5.15.126 or 5.15.131 with 8.1 doesn't work for me either

Only new installations were succesfully comming up and could also successfully be upgraded to kernel 6.5.11-[5/6/7]-pve.
No other solution than to reinstall the cluster nodes step by step ...
No problems so far.
 
Same issue here. I had to pin a kernel older than 6.5 in order to get 5 DELL servers to boot. Otherwise it hangs at "loading initial ramdisk" and never comes up without a power cycle.
 
  • Like
Reactions: zhivko
Same issue here. I had to pin a kernel older than 6.5 in order to get 5 DELL servers to boot. Otherwise it hangs at "loading initial ramdisk" and never comes up without a power cycle.
I don't know If this is anecdotal or not, but when I attempt to install proxmox 8.1 on my external usb/nvme dongle it has the problem. When I accidently installed 8.1 on the internal SSD drive (where the guests are normally stored) it booted with no issues.
 
Happy Proxmox paying customer w/three nodes, all Dell and Supermicro 2U servers.
Unhappy that I can't upgrade the Supermicro to any 6.5.x kernel.
Not sure if the bug report should go here or a new thread, but it totally blocks the upgrade path.

All 6.2 and earlier Proxmox 8.x work fine. Using the Enterprise repository, upgrading to any of the 6.5.x Linux kernels causes the fault.

If during a node reboot I select the 6.2 kernel from the boot menu, all is well. So no touching hardware or disks, just selecting different kernel at boot menu.

(very) tempory fix: pin the 6.2 kernel using Proxmox boot tool, so nobody accidentally boots into 6.5.x

Here's the bug:

Servers have a BMC (or lights-out or remote management...) and it lets you use a separate ethernet port to talk to the chassis and power on/off, get a remote terminal with screen, keyboard etc. The BMC also shows temperatures and controls the fans. So the BMC is sort of important as if you don't know the temperatures and don't control the fans your $12,000 server can toast itself (or at least be in thermal throttle mode which isn't great either).

Using Proxmox on 6.2 or earlier the BMC works fine on Supermicro AS-2015CS-TNR which is an AMD Epyc.

Booting Proxmox 6.5.x causes ALL BMC sensor data to go away. There are normally many dozen entries for temperatures, voltages, fan RPM and EVERY one of them is just gone, BMC web page says NA for all of them. This is with no code added to Proxmox, totally out of the box.

With 6.2.x this all works from the BMC web page and (optional, I tried with and without) installing ipmitool and running ipmitool sensors command shows them all too.

So when 6.5.x fails, first thing I tried was apt purge ipmitool (in case the user space tool or libraries it pulls in is causing the issue). Sadly, no improvement after ipmitool purge and a reboot. But reboot into 6.2.x still OK.

I then did the obvious... read up on how ACPI figures out what it should do. Oh my, uses ACPI and by default uses ACPI in the Bios to figure things out. Nothing like x86 Bios vs OS battles. Only the user loses :)

What I *think* I'd like to do is understand how I can blacklist the usual suspects such that Linux kernel simply does NOT TOUCH the BMC stuff, I can live with access to the BMC to check temps and control fans only through the separate ethernet interface. In the near term, I do not need the node to be able to touch the BMC at all, and would hope that having it not touch anything results in it not breaking the BMC.

The Proxmox boot pin command is really nice. Is there some similar way to blacklist the IPMI module(s) and maybe bisect the problem down to one of them?

I'm reluctant to try and build kernels as Proxmox builds their own kernels and even if I started with plain Debian and spent time trying to bisect it at the Debian USB live stick route, no telling if that would help getting it fixed in Proxmox.

Please advise what to do... Oh and our (much older) Dell R730xd servers all seem perfectly happy with all the Enterprise updates and are on 6.5 for a couple weeks now... with their temps & fans just fine.
 
Hi,
same for me on Dell PowerEdge R340 when upgrading from Proxmox 7 to 8: system hangs on "loading initrd" without any output when trying to load a kernel 6.5.x; last try was 6.5.11-7-pve, but it doesn't boot either. Putting "simplefb" in place made do difference. Older kernels work fine, so for the moment, I pinned 6.2.16-20-pve. But of course this can't be a permanent solution.
I upgraded several Proxmox 7 instances to 8, and they all are up & running; but these are Dell R6515 or R630. The R340 is the only one making this noise (SO FAR! And it can stay that way!).
Regards,
Marianne
 
Same issues on a Dell Poweredge T140. More information here: https://forum.proxmox.com/threads/cant-boot-efi-stub-loaded.138549/#post-618446

I've pinned the 6.2.16-20-pve kernel at boot as well for now. In my case, the network cards blink, but I'm never able to ping the server or log in, even with physically moving the ethernet cable to other nics. It's just dead. Keyboard is locked as well, and I have to hard power off the PC (or use idrac to reboot).
 
  • Like
Reactions: axeli

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!