Debugging boot issues (with the installation media as well)

anderstn

Member
Dec 22, 2020
23
0
6
35
Hi

I just swapped my motherboard and CPU (from a second gen Ryzen on a B350, with an up to date BIOS, to a third gen on a B550 motherboard) and I was expecting some boot issues, but I'm struggling to see exactly what fails during boot. I tried following this guide: https://forum.proxmox.com/threads/boot-troubleshooting-verbose-output.155631/ to get a more verbose output during boot but removing "quiet" does not seem to make a difference.

As far as were I'm getting stuck i manage to reach the loading initial ramdisk stage. Here the process just gets stuck seemingly forever. In other oddities the ramdisk image won't boot because of some signature issue and trying to open the UEFI boot options menu choice causes the computer to restart. Can't say I have ever used this option so no clue if that is normal or not. Keep in mind I carried over the memory from the previous motherboard so I doubt it's an issue with the memory sticks themselves.

For now I really just want to know if there is something I can do to get more information about what is happening when, or just before, the boot process fails.
 
One other peice of hardware was changed. The GPU was shifted from an old AMD Radeon card to a cheap Nvidia card (Geforce 730) as the former was dead. However I don't see how that can be related to the ramdisk given that video output is working fine
 
Last edited:
please post the whole output of the failing boot process..

if you changed hardware, it is possible that your initrd is missing some modules or firmware files (e.g., for your GPU). in that case, you might need to boot from a live CD, chroot into your PVE installation, install missing packages and regenerate the initrd.
 
Like my initial post stated there is not much to show. I'm currently at work but if you follow this link https://drive.google.com/file/d/1o00nHnuIBs4OFly-91vuHIS1KlcMgtt1/view?usp=sharing you can see a video I took yesterday. Note that in this video I have not tried to remove "quiet" as per the guide I mentioned earlier, but when I did it made no difference. The video cuts out pretty quickly, but beyond the command to load the initial ramdisk nothing happens. It seems like it might be able to get the process to a certain point as I can use ctrl + alt + delete to restart the computer for a while (maybe 30 seconds or so) and then it hangs up forcing me to do a hard restart. Off camera I did try to add another echo command just to check that my changes to the boot parameters actually took effect and that worked so my best guess is that whatever output is produced by removing "quiet" is sent somewhere else and not to the screen. That said I guess the initrd stage is where I really need more output and I have found no guides explaining how I can get that.

I did consider just reinstalling Proxmox, but I would like to recover some files from one of the VM's first. If it's easier to just mount the old OS drive to a new VM after a fresh install than actually go through the steps of installing missing packages and regenerating the initrd configuration then I can do that instead. Still very annoying that getting a more verbose output from the boot process is as difficult as it is though :(

PS: I ran memtest trough the night just in case as well, and as expected no errors were found
 
did you take that video in slow motion? because loading the kernel definitely shouldn't take that long ;)

if you ping your machine, does it come up? if not, it is possibly really hanging at the initrd stage for some reason.. could still be GPU related (missing driver modules or firmware - the early boot uses a different, very limited way of accessing the GPU compared to the rest of the boot/system).

I would try booting a live CD, installing the required nvidia drivers for your GPU and regenerating the initrd. the process would be similar to what is described here:

https://pve.proxmox.com/wiki/Recover_From_Grub_Failure

just replacing the "update-grub" and "grub-install" steps with "apt install ..." and "update-initramfs -u -k all" or "proxmox-boot-tool refresh", depending on how your boot setup is managed. you can also take this opportunity to see if the failed boot attempts managed to get as far as logging something to the journal/system logs.
 
Video was in real time so it may very well be some shenanigans at the kernel loading stage as well. That said I have always felt that this machine has been slow to boot. Might just be an AM4 platform quirk or the fact that there are many drives and PCIe cards in play (SAS controller + network card).

I will try following the guide you listed with the changes. Might have to wait until the weekend though.
 
you might want to add "rootdelay=120" to the kernel command line - if the system is slow to enumerate/discover/bring up hardware, it might be stuck not finding the root disk, but also being unable to tell your for lack of GPU support at the initrd stage..
 
did you take that video in slow motion? because loading the kernel definitely shouldn't take that long ;)

if you ping your machine, does it come up? if not, it is possibly really hanging at the initrd stage for some reason.. could still be GPU related (missing driver modules or firmware - the early boot uses a different, very limited way of accessing the GPU compared to the rest of the boot/system).

I would try booting a live CD, installing the required nvidia drivers for your GPU and regenerating the initrd. the process would be similar to what is described here:

https://pve.proxmox.com/wiki/Recover_From_Grub_Failure

just replacing the "update-grub" and "grub-install" steps with "apt install ..." and "update-initramfs -u -k all" or "proxmox-boot-tool refresh", depending on how your boot setup is managed. you can also take this opportunity to see if the failed boot attempts managed to get as far as logging something to the journal/system logs.
So this didn't work. I think the issue might be that something is not mounted as it should be, but the wiki page just mentions that commands may be different depending on your system and offers no insight into how I can figure out which devices to mount or what they actually would refer to if the structure was 1 to 1 with the commands listed.

The command equivalent to "sudo mount /dev/sda1 /media/RESCUE/boot" fails. As far as I can tell my memory stick has become device sdi. On that device sdi1 is a 1007K partition that my Debian Live CD can't make any sense of. The sdi2 partiton is 512M and can be mounted. It looks like a proper boot partition. "update-initramfs -u -k all" returns nothing in way of output. The "proxmox-boot-tool refresh" comman complains that it can't fin any UUIDs.

So I figured fuck it. I'll reinstall proxmox. Except regardless how i boot or how many onboard devices I disable (to prevent loading any troublesome drivers) it crashes here:
20241123_225137 (Large).jpg

I tried booting the terminal option for proxmox. It made no difference.

PS: A Debian Live 12.8 live distro boots just fine. I would assume these two were related. Is there any way to see which drivers the Debian Live distro loads after boot? Maybe the differences here can be a hint as to what is causing the issue.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!