[SOLVED] Proxmox Host hangs at load screen

Technoob9

Member
Aug 10, 2023
49
1
8
So I was using my server a three weeks ago (to update the network settings because I moved) and everything worked great. During this time, I restarted it multiple times and shutdown and cold boot multiple times.

Turned it on today, and it hangs at the Proxmox loading page:
1720569129933.png
I do have a GPU passthrough to a VM and I do have this VM start at boot automatically. So I assume this is the problem. Here is what I tried:

1. In Grub, removed quiet and amd_immou = on . It loads further but hangs here:
1720569610758.png

2. I tried disabling SVM and IMMOU in BIOS, and it hangs at the initial screen shot above.
3. I tried removing the GPU entirely and I still cannot access the server via the web gui. So I assume it does not work still.

Ethernet light is green and seems to be connected OK.

Any advice would be appreciated.

Thanks.
 
Last edited:
I also tried switching the network cable as I read a similar case to mine that said it was a network cord issue. Unfortunately, it didn't resolve anything.
 
3. I tried removing the GPU entirely and I still cannot access the server via the web gui. So I assume it does not work still.

Ethernet light is green and seems to be connected OK.

What is on the screen after it (presumably) boots? Or was it the only GPU? Did you try to Live boot and check logs from the last boot (when you cannot just access the GUI?)
 
What is on the screen after it (presumably) boots? Or was it the only GPU? Did you try to Live boot and check logs from the last boot (when you cannot just access the GUI?)
It goes past the POST screen (AsRock logo) then Proxmox logo to the blue screen where I can select Proxomox, Grub, etc. Then when it tries to boot into Proxmox, I get the above screenshots (when I try what I did above) and nothing happens.

If I do absolutely nothing and pretend to boot my server up cold, I get the very first screenshot in my initial post.
 
It goes past the POST screen (AsRock logo) then Proxmox logo to the blue screen where I can select Proxomox, Grub, etc. Then when it tries to boot into Proxmox, I get the above screenshots (when I try what I did above) and nothing happens.

If I do absolutely nothing and pretend to boot my server up cold, I get the very first screenshot in my initial post.

Can you select on the "blue" menu advanced, then recovery, then get journalctl -b -1 > lastboot.log to attach here?
 
Unfortunately it does not provide any giveaway, at least to me, what is happening there. Can you get Debian Live boot system and check the logs that way? You may want to run fsck while at that.
I am not sure what that means to be honest. Like create any Linux boot drive and then through the GUI access the terminal of that OS?
 
I am not sure what that means to be honest. Like create any Linux boot drive and then through the GUI access the terminal of that OS?

You would ideally get Debian live [1] to boot from the machine, just like when you are about install new system, instead you boot the machine and have all the Debian tools to examine what's been going on with your PVE boots by checking what's with the drives, what logs were stored.

Once booted, you can check available drives with e.g. fdisk -l [2], lvs [3] and fsck [4]. If you are able to get your PVE host root filesystem mounted you might then chroot [5] into the host system and perform necessary fixes, but first check the logs from the last boot there.

[1] https://www.debian.org/CD/live/
[2] https://manpages.debian.org/bookworm/fdisk/fdisk.8.en.html
[3] https://manpages.debian.org/bookworm/lvm2/lvs.8.en.html
[4] https://manpages.debian.org/bookworm/util-linux/fsck.8.en.html
[5] https://wiki.debian.org/RescueLive
 
You would ideally get Debian live [1] to boot from the machine, just like when you are about install new system, instead you boot the machine and have all the Debian tools to examine what's been going on with your PVE boots by checking what's with the drives, what logs were stored.

Once booted, you can check available drives with e.g. fdisk -l [2], lvs [3] and fsck [4]. If you are able to get your PVE host root filesystem mounted you might then chroot [5] into the host system and perform necessary fixes, but first check the logs from the last boot there.

[1] https://www.debian.org/CD/live/
[2] https://manpages.debian.org/bookworm/fdisk/fdisk.8.en.html
[3] https://manpages.debian.org/bookworm/lvm2/lvs.8.en.html
[4] https://manpages.debian.org/bookworm/util-linux/fsck.8.en.html
[5] https://wiki.debian.org/RescueLive
Here's the outputs. Nothing came up for lastboot.log though (I probably have the wrong command)
https://imgur.com/a/3UGFRcE
 
Last edited:
Here's the outputs. Nothing came up for lastboot.log though.
https://imgur.com/a/3UGFRcE

Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.
 
Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.
Hey,

thanks for the detailed reply. I will look into how to get it in a textfile.

For the fsck, it says that it's mounted, but then says:
e2fsck: Cannot continue, aborting.

I will work on getting the log.
 
Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.
It says journalctl-D command is not found. I guess I will have to install it? Pretty sure systemd is already installed though.
 
Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.
 
Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.
No, I really appreciate the help. This is becoming a frustrating hobby haha.

Here is the journal log. https://pastebin.com/ZWehSw0X

I am looking through it, but I am not sure what I am looking for to be honest.
 
Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.
Here is the fsck:
1720978095809.png
 
No, I really appreciate the help. This is becoming a frustrating hobby haha.

Here is the journal log. https://pastebin.com/ZWehSw0X

I am looking through it, but I am not sure what I am looking for to be honest.

So I might have overlooked something but I just see ...

Code:
Jul 14 11:49:15 pve systemd[1]: Reached target multi-user.target - Multi-User System.
Jul 14 11:49:15 pve systemd[1]: Reached target graphical.target - Graphical Interface.
Jul 14 11:49:16 pve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Jul 14 11:49:16 pve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Jul 14 11:49:16 pve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Jul 14 11:49:16 pve systemd[1]: Startup finished in 22.886s (firmware) + 6.926s (loader) + 4.622s (kernel) + 10.353s (userspace) = 44.789s.
Jul 14 11:49:23 pve systemd[1]: Starting e2scrub_all.service - Online ext4 Metadata Check for All Filesystems...
Jul 14 11:49:23 pve systemd[1]: e2scrub_all.service: Deactivated successfully.
Jul 14 11:49:23 pve systemd[1]: Finished e2scrub_all.service - Online ext4 Metadata Check for All Filesystems.
Jul 14 11:49:36 pve systemd[1]: systemd-fsckd.service: Deactivated successfully.
Jul 14 11:49:50 pve kernel: logitech-hidpp-device 0003:046D:4016.0005: HID++ 2.0 device connected.
Jul 14 11:49:51 pve systemd[1]: Received SIGINT.
Jul 14 11:49:51 pve systemd[1]: Activating special unit reboot.target...

I wonder why why systemd got SIGINT right there just after filesystems checks. Silly question, how long did you want wait before going on to the triple take rampage ...

Code:
Jul 14 11:49:54 pve systemd[1]: Forcibly rebooting: Ctrl-Alt-Del was pressed more than 7 times within 2s
 
Last edited:
So I might have overlooked something but I just see ...

Code:
Jul 14 11:49:15 pve systemd[1]: Reached target multi-user.target - Multi-User System.
Jul 14 11:49:15 pve systemd[1]: Reached target graphical.target - Graphical Interface.
Jul 14 11:49:16 pve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Jul 14 11:49:16 pve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Jul 14 11:49:16 pve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Jul 14 11:49:16 pve systemd[1]: Startup finished in 22.886s (firmware) + 6.926s (loader) + 4.622s (kernel) + 10.353s (userspace) = 44.789s.
Jul 14 11:49:23 pve systemd[1]: Starting e2scrub_all.service - Online ext4 Metadata Check for All Filesystems...
Jul 14 11:49:23 pve systemd[1]: e2scrub_all.service: Deactivated successfully.
Jul 14 11:49:23 pve systemd[1]: Finished e2scrub_all.service - Online ext4 Metadata Check for All Filesystems.
Jul 14 11:49:36 pve systemd[1]: systemd-fsckd.service: Deactivated successfully.
Jul 14 11:49:50 pve kernel: logitech-hidpp-device 0003:046D:4016.0005: HID++ 2.0 device connected.
Jul 14 11:49:51 pve systemd[1]: Received SIGINT.
Jul 14 11:49:51 pve systemd[1]: Activating special unit reboot.target...

I wonder why why systemd got SIGINT right there just after filesystems checks. Silly question, how long did you want wait before going on to the triple take rampage ...

Code:
Jul 14 11:49:54 pve systemd[1]: Forcibly rebooting: Ctrl-Alt-Del was pressed more than 7 times within 2s
LOL. I am not sure what sigint is.

And this was a more recent reboot. When I first started having these issues, I left it alone for like 10 minutes and it still hung at the initial screenshot I posted in my first post.

Would trying to find the .conf files for the VMs and make my VMs not auto boot potentially help?
 
LOL. I am not sure what sigint is.

Something is asking it to terminate...
https://www.fosslinux.com/121761/the-abcs-of-linux-signals-sigint-sigterm-and-sigkill-explained.htm

And this was a more recent reboot. When I first started having these issues, I left it alone for like 10 minutes and it still hung at the initial screenshot I posted in my first post.

Alright then, how about posting output for longer journal entries back and instead of -b -1 (means last boot) you just do something like --since="2024-07-10" or simply --since "5 days ago" instead.

Would trying to find the .conf files for the VMs and make my VMs not auto boot potentially help?

I do not like to guess unless I have to, I would reckon something would be in the first boot log, i.e. first time it did not proceed to boot properly about why. Then start from there.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!