[SOLVED] Proxmox Host hangs at load screen

Technoob9 · Jul 10, 2024

So I was using my server a three weeks ago (to update the network settings because I moved) and everything worked great. During this time, I restarted it multiple times and shutdown and cold boot multiple times.

Turned it on today, and it hangs at the Proxmox loading page:

I do have a GPU passthrough to a VM and I do have this VM start at boot automatically. So I assume this is the problem. Here is what I tried:

1. In Grub, removed quiet and amd_immou = on . It loads further but hangs here:

2. I tried disabling SVM and IMMOU in BIOS, and it hangs at the initial screen shot above.
3. I tried removing the GPU entirely and I still cannot access the server via the web gui. So I assume it does not work still.

Ethernet light is green and seems to be connected OK.

Any advice would be appreciated.

Thanks.

Technoob9 · Jul 10, 2024

I also tried switching the network cable as I read a similar case to mine that said it was a network cord issue. Unfortunately, it didn't resolve anything.

Technoob9 · Jul 13, 2024

Can anyone help?

esi_y · Jul 13, 2024

Technoob9 said:
3. I tried removing the GPU entirely and I still cannot access the server via the web gui. So I assume it does not work still.

Ethernet light is green and seems to be connected OK.

What is on the screen after it (presumably) boots? Or was it the only GPU? Did you try to Live boot and check logs from the last boot (when you cannot just access the GUI?)

Technoob9 · Jul 13, 2024

esi_y said:
What is on the screen after it (presumably) boots? Or was it the only GPU? Did you try to Live boot and check logs from the last boot (when you cannot just access the GUI?)

It goes past the POST screen (AsRock logo) then Proxmox logo to the blue screen where I can select Proxomox, Grub, etc. Then when it tries to boot into Proxmox, I get the above screenshots (when I try what I did above) and nothing happens.

If I do absolutely nothing and pretend to boot my server up cold, I get the very first screenshot in my initial post.

esi_y · Jul 13, 2024

Technoob9 said:
It goes past the POST screen (AsRock logo) then Proxmox logo to the blue screen where I can select Proxomox, Grub, etc. Then when it tries to boot into Proxmox, I get the above screenshots (when I try what I did above) and nothing happens.

If I do absolutely nothing and pretend to boot my server up cold, I get the very first screenshot in my initial post.

Can you select on the "blue" menu advanced, then recovery, then get journalctl -b -1 > lastboot.log to attach here?

Technoob9 · Jul 13, 2024

esi_y said:
Can you select on the "blue" menu advanced, then recovery, then get journalctl -b -1 > lastboot.log to attach here?

When I do that, it hangs on this screen indefinitely:

esi_y · Jul 13, 2024

Technoob9 said:
When I do that, it hangs on this screen indefinitely:
View attachment 71295

Unfortunately it does not provide any giveaway, at least to me, what is happening there. Can you get Debian Live boot system and check the logs that way? You may want to run fsck while at that.

Technoob9 · Jul 13, 2024

esi_y said:
Unfortunately it does not provide any giveaway, at least to me, what is happening there. Can you get Debian Live boot system and check the logs that way? You may want to run fsck while at that.

I am not sure what that means to be honest. Like create any Linux boot drive and then through the GUI access the terminal of that OS?

esi_y · Jul 13, 2024

Technoob9 said:
I am not sure what that means to be honest. Like create any Linux boot drive and then through the GUI access the terminal of that OS?

You would ideally get Debian live [1] to boot from the machine, just like when you are about install new system, instead you boot the machine and have all the Debian tools to examine what's been going on with your PVE boots by checking what's with the drives, what logs were stored.

Once booted, you can check available drives with e.g. fdisk -l [2], lvs [3] and fsck [4]. If you are able to get your PVE host root filesystem mounted you might then chroot [5] into the host system and perform necessary fixes, but first check the logs from the last boot there.

[1] https://www.debian.org/CD/live/
[2] https://manpages.debian.org/bookworm/fdisk/fdisk.8.en.html
[3] https://manpages.debian.org/bookworm/lvm2/lvs.8.en.html
[4] https://manpages.debian.org/bookworm/util-linux/fsck.8.en.html
[5] https://wiki.debian.org/RescueLive

Technoob9 · Jul 14, 2024

esi_y said:
You would ideally get Debian live [1] to boot from the machine, just like when you are about install new system, instead you boot the machine and have all the Debian tools to examine what's been going on with your PVE boots by checking what's with the drives, what logs were stored.

Once booted, you can check available drives with e.g. fdisk -l [2], lvs [3] and fsck [4]. If you are able to get your PVE host root filesystem mounted you might then chroot [5] into the host system and perform necessary fixes, but first check the logs from the last boot there.

[1] https://www.debian.org/CD/live/
[2] https://manpages.debian.org/bookworm/fdisk/fdisk.8.en.html
[3] https://manpages.debian.org/bookworm/lvm2/lvs.8.en.html
[4] https://manpages.debian.org/bookworm/util-linux/fsck.8.en.html
[5] https://wiki.debian.org/RescueLive

Here's the outputs. Nothing came up for lastboot.log though (I probably have the wrong command)
https://imgur.com/a/3UGFRcE

esi_y · Jul 14, 2024

Technoob9 said:
Here's the outputs. Nothing came up for lastboot.log though.
https://imgur.com/a/3UGFRcE

Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.

Technoob9 · Jul 14, 2024

esi_y said:
Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.

Hey,

thanks for the detailed reply. I will look into how to get it in a textfile.

For the fsck, it says that it's mounted, but then says:
e2fsck: Cannot continue, aborting.

I will work on getting the log.

Technoob9 · Jul 14, 2024

esi_y said:
Hey! I have to admit I just dropped the names around for you to try e.g. search more information on and also keep it more generic (one can't know drive structure in advance). Btw you can also save your outputs in a text file onto e.g. USB stick or scp them out and upload as text here.

Anyhow, from what you posted, I can tell you have ZFS on EXOS (/dev/sdc on your screenshots) and then A400 with LVM for the PVE install (/dev/sdb - be aware these designations can show up differently on different boots).

LVM recognises the PVE volumes and you can now do the actual troubleshooting.

Before you do that, you should be able to run fsck on that /dev/mapper/pve-root - so: fsck /dev/mapper/pve-root

The second thing is - I do not just want to provide blind instructions, also when others find the thread they should understand why and was done conceptually - you are checking journal of that LIVE OS you are booting on instead of that host PVE system.

You either have to follow the chroot process (in your case you would be mounting your host volume as mount /dev/mapper/pve-root /mnt) and that's great learning experience (on chroot in general and also) if you later want to fix the system, or if you only want to check the log, you have to inquire for the logs (if you do not chroot) on that mounted host PVE volume with e.g. journalctl-D /mnt/var/log/journal (as opposed to /var/log/journal) where it looks for by default.

It says journalctl-D command is not found. I guess I will have to install it? Pretty sure systemd is already installed though.

esi_y · Jul 14, 2024

Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.

Technoob9 · Jul 14, 2024

esi_y said:
Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.

No, I really appreciate the help. This is becoming a frustrating hobby haha.

Here is the journal log. https://pastebin.com/ZWehSw0X

I am looking through it, but I am not sure what I am looking for to be honest.

Technoob9 · Jul 14, 2024

esi_y said:
Ooops, my typo ... so these are just regular switches (-D, -b, ...) and you can compound them.

You can have a look at man journactl.

You can then do: journalctl -b -1 -D /mnt/var/log/journal

This assumes you have the host PVE volume mounted on /mnt, i.e. you had previously ran mount /dev/mapper/pve-root /mnt

In turn, I think this is also the reason you got your fsck error - you should do this BEFORE the pve-root volume is mounted. So you can e.g. unmount it (be sure to get out of the directory if within) and issue command like umount /dev/mapper/pve-root (or umount /mnt if it's there). You could then run fsck -f /dev/mapper/pve-root

I am attempting to make no typos, but always take posts like mine with a pinch of salt - someone might even tell you do wipe out drive and you should not blindly follow that before checking at least the manual pages.

Here is the fsck:

esi_y · Jul 14, 2024

Technoob9 said:
No, I really appreciate the help. This is becoming a frustrating hobby haha.

Here is the journal log. https://pastebin.com/ZWehSw0X

I am looking through it, but I am not sure what I am looking for to be honest.

So I might have overlooked something but I just see ...

Code:

Jul 14 11:49:15 pve systemd[1]: Reached target multi-user.target - Multi-User System.
Jul 14 11:49:15 pve systemd[1]: Reached target graphical.target - Graphical Interface.
Jul 14 11:49:16 pve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Jul 14 11:49:16 pve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Jul 14 11:49:16 pve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Jul 14 11:49:16 pve systemd[1]: Startup finished in 22.886s (firmware) + 6.926s (loader) + 4.622s (kernel) + 10.353s (userspace) = 44.789s.
Jul 14 11:49:23 pve systemd[1]: Starting e2scrub_all.service - Online ext4 Metadata Check for All Filesystems...
Jul 14 11:49:23 pve systemd[1]: e2scrub_all.service: Deactivated successfully.
Jul 14 11:49:23 pve systemd[1]: Finished e2scrub_all.service - Online ext4 Metadata Check for All Filesystems.
Jul 14 11:49:36 pve systemd[1]: systemd-fsckd.service: Deactivated successfully.
Jul 14 11:49:50 pve kernel: logitech-hidpp-device 0003:046D:4016.0005: HID++ 2.0 device connected.
Jul 14 11:49:51 pve systemd[1]: Received SIGINT.
Jul 14 11:49:51 pve systemd[1]: Activating special unit reboot.target...

I wonder why why systemd got SIGINT right there just after filesystems checks. Silly question, how long did you ~~want~~ wait before going on to the triple take rampage ...

Code:

Jul 14 11:49:54 pve systemd[1]: Forcibly rebooting: Ctrl-Alt-Del was pressed more than 7 times within 2s

Technoob9 · Jul 14, 2024

esi_y said:

So I might have overlooked something but I just see ...

Code:

Jul 14 11:49:15 pve systemd[1]: Reached target multi-user.target - Multi-User System.
Jul 14 11:49:15 pve systemd[1]: Reached target graphical.target - Graphical Interface.
Jul 14 11:49:16 pve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Jul 14 11:49:16 pve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Jul 14 11:49:16 pve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Jul 14 11:49:16 pve systemd[1]: Startup finished in 22.886s (firmware) + 6.926s (loader) + 4.622s (kernel) + 10.353s (userspace) = 44.789s.
Jul 14 11:49:23 pve systemd[1]: Starting e2scrub_all.service - Online ext4 Metadata Check for All Filesystems...
Jul 14 11:49:23 pve systemd[1]: e2scrub_all.service: Deactivated successfully.
Jul 14 11:49:23 pve systemd[1]: Finished e2scrub_all.service - Online ext4 Metadata Check for All Filesystems.
Jul 14 11:49:36 pve systemd[1]: systemd-fsckd.service: Deactivated successfully.
Jul 14 11:49:50 pve kernel: logitech-hidpp-device 0003:046D:4016.0005: HID++ 2.0 device connected.
Jul 14 11:49:51 pve systemd[1]: Received SIGINT.
Jul 14 11:49:51 pve systemd[1]: Activating special unit reboot.target...

I wonder why why systemd got SIGINT right there just after filesystems checks. Silly question, how long did you ~~want~~ wait before going on to the triple take rampage ...

Code:

Jul 14 11:49:54 pve systemd[1]: Forcibly rebooting: Ctrl-Alt-Del was pressed more than 7 times within 2s

LOL. I am not sure what sigint is.

And this was a more recent reboot. When I first started having these issues, I left it alone for like 10 minutes and it still hung at the initial screenshot I posted in my first post.

Would trying to find the .conf files for the VMs and make my VMs not auto boot potentially help?

esi_y · Jul 14, 2024

Technoob9 said:
LOL. I am not sure what sigint is.

Something is asking it to terminate...
https://www.fosslinux.com/121761/the-abcs-of-linux-signals-sigint-sigterm-and-sigkill-explained.htm

Technoob9 said:
And this was a more recent reboot. When I first started having these issues, I left it alone for like 10 minutes and it still hung at the initial screenshot I posted in my first post.

Alright then, how about posting output for longer journal entries back and instead of -b -1 (means last boot) you just do something like --since="2024-07-10" or simply --since "5 days ago" instead.

Technoob9 said:
Would trying to find the .conf files for the VMs and make my VMs not auto boot potentially help?

I do not like to guess unless I have to, I would reckon something would be in the first boot log, i.e. first time it did not proceed to boot properly about why. Then start from there.

[SOLVED] Proxmox Host hangs at load screen

Member

Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Member

Renowned Member

Member

Member

Renowned Member

Member

Renowned Member