[SOLVED] [Critical issue] Proxmox won't boot and UEFI boot sequence takes minutes

Pheggas · Nov 7, 2023

Hello everyone. I'd like to ask you for help with my proxmox setup. I will first describe my story and after then the errors.

The story: Today i was about to install Hassio so i found nice script that would do it for me (my friend installed it the same way and it is okay so far). It immediately wrote that i have unsupported version of Proxmox (7.1 and it needed 7.2 or higher). So i googled how to update proxmox (at first i wanted to upgrade to Proxmox 8 but then i realised latest version of Proxmox 7 would be fine as well). TO BE SAFE, i started backup on all two VMs and one LXC container i had and i was about to download them locally so in any case of disaster i would recover everything in matter of hours - but i WOULD recover. The LXC container did backup perfectly after a few minutes. Then the first VM (TrueNAS) was fine as well in matter of seconds.

And now the main VM that has 200GB allocated and i had every service that i was running there. I chose to save the backup to 250 GB SSD that has about 100 GB already taken. The reason i chose the SSD was that i noticed the LXC container compressed it's backup to 1GB instead of 5GB. So i said it would do the compress in this case as well and would fit in the space. Well, it stuck on 47% for like 20 minutes so i clicked in WebUI on PVE and clicked shutdown (i wanted to reboot but whatever, i missclicked). So i walked to the server and started it manually. After 5 minutes the WebUI didn't come up so i walked back to the server and force shudown it by holding the power button. Turned it on again and still nothing after couple of minutes. So i force shutdown it again and connected it to the display to see what's going on.

The HP logo showed up (which was perfectly fine and it means it's about to boot what's on the internal ssd). It took about 2 minutes which stressed me out (because it should start the boot in a few seconds, not minutes). I was waiting for any kind of progress which came but it was REALLY slow. It took about 1 and a half minute to write a few log messages from the boot sequence shown on this link. Then it displayed this.

Do you know what gone wrong and maybe how could i fix it? I'm able to login into this emergency mode but the SSD won't mount somehow. Could it be maybe caused by the fact the SSD could be potentially completely out of space? I was reading somewhere SSD needs some breathable space to operate properly and it happened to me once the Windows won't boot because the SSD was completely out of space.

Thank you for any comment.

PS: ~~Attaching the images in case the links won't open properly~~ it says they're too large to upload. I downscaled them to a few kBs and still won't upload. Just write me PM and i will find other way to post them.

Deleted member 205422 · Nov 8, 2023

Pheggas said:
TO BE SAFE, i started backup on all two VMs and one LXC container i had and i was about to download them locally

Locally to your laptop/workstation/anything other than the PVE node?

Pheggas said:
And now the main VM that has 200GB allocated ... it stuck on 47% for like 20 minutes so i clicked in WebUI on PVE and clicked shutdown ... started it manually ... the WebUI didn't come up so i walked back to the server and force shudown ... on again and still nothing ...force shutdown it again and connected it to the display to see what's going on.

Lesson learned, do not cold shutdown a machine that does not boot up, try to SSH in, then connect a display, try whatever before a power loss.

Pheggas said:
The HP logo showed up ... took about 2 minutes ... waiting for any kind of progress ... took about 1 and a half minute to write a few log messages from the boot sequence shown on this link. Then it displayed this.

After login, can you show us: lsblk -o +PARTLABEL,LABEL,FSTYPE,PARTTYPE
And: lvscan --readonly

Pheggas said:
Do you know what gone wrong and maybe how could i fix it? I'm able to login into this emergency mode but the SSD won't mount somehow. Could it be maybe caused by the fact the SSD could be potentially completely out of space?

The reason it does not mount is not that it is out of space, there were reserved blocks for root user so in that sense it would never get completely out of space, there would be enough space to emergency troubleshoot. The reason it does not automount is likely the cold power cycle.

Pheggas · Nov 8, 2023

Esiy said:
Locally to your laptop/workstation/anything other than the PVE node?

I'm not sure if I understand your question but I have only one PVE on my network and that is my sever that I want to fix in this thread.

Esiy said:
try to SSH in, then connect a display

I tried the ping but as it was still in boot sequence, it was not responding on the ping. But yeah, I should connect the display rather than force shutdown the machine. Lesson learned as you said.

Esiy said:
After login, can you show us

I will as soon as I get home today - will post new comment so you will get notified.

Esiy said:
there were reserved blocks for root user so in that sense it would never get completely out of space

Uh alright. I was just scared it's the same case.

Anyway, I think the main VM that I care about was actually saved on the different drive (the boot drive). So technically, as I comment out the line where the SSD mounts, it should boot into the system and VM should be visible and potentially fine and bootable.

In case this happens, is there a way backup this VM to remote storage instead of local one?

Deleted member 205422 · Nov 8, 2023

Pheggas said:
I'm not sure if I understand your question but I have only one PVE on my network and that is my sever that I want to fix in this thread.

I just wondered what was your ultimate objective when you said you planned to back it up "locally" ... to another machine (even laptop) in the location (home) or locally in that very PVE node. Now I got it from your reaction below.

Pheggas said:
I tried the ping but as it was still in boot sequence, it was not responding on the ping. But yeah, I should connect the display rather than force shutdown the machine. Lesson learned as you said.

If you have default BIOS/UEFI settings, single press of power button would also do soft shutdown, if it does not, something is amiss and I would want to investigate before pulling the plug (which the long-press is). But you might have ended up in this same situation because of a power loss, so that's why one best plans for disaster recovery, disasters happen.

Pheggas said:
I will as soon as I get home today - will post new comment so you will get notified.

No worries, to be honest I was a bit confused about what drives are in and what was the automount for and where were the backups supposed to be and what you want to save (because PVE can be reinstalled). By asking for those outputs I do not have to ask for explanations, it will just show it all.

Pheggas said:
Uh alright. I was just scared it's the same case.

It's not like SSDs get truly filled up, sure they slow down, but the SSD firmware already overprovisions around 10% and then there's the reserved blocks for root, but you filling up the (wrong) volume might have caused your PVE not to boot. It did not cause any filesystem corruption on its own. The cold shutdowns we do not know. You should focus on what you want to get out, if PVE root volume is beyond repair it's not a tragedy I suppose, you want the VMs... unless ...

Pheggas said:
Anyway, I think the main VM that I care about was actually saved on the different drive (the boot drive).

... we do not know, I would have just preferred to see from the rescue shell what's the state of matters before letting it fully boot myself. LVM when it's touching the volumes is already accessing metadata and doing writes.

Pheggas said:
So technically, as I comment out the line where the SSD mounts, it should boot into the system and VM should be visible and potentially fine and bootable.

See above.

Pheggas said:
In case this happens, is there a way backup this VM to remote storage instead of local one?

Sure, but where is this VM's volume in the first place?

I am just confused about your drives layout without seeing the outputs.

Pheggas · Nov 8, 2023

Esiy said:
Sure, but where is this VM's volume in the first place? I am just confused about your drives layout without seeing the outputs.

No wonder, I wrote it a bit chaotic. In fact, I have attached 2 drives to my server. First that is also a boot drive but also acts as storage for one of my VMs (let's make it clear, it's the TrueNAS VM for my attached-through-usb HDDs) and one LXC container that is just basic debian running one GitHub project on it. I really don't care about those - can setup those in few minutes as I have backed up config from the TrueNAS and the project on the container doesn't work anymore so i don't need it currently. It was turned off most of the time.

The VM I care about is my main VM where I host all my docker containers on Ubuntu server. All the databases, and backend data are there. And this VM SHOULD be stored on the boot drive rather than that other two that are stored on the SSD that cannot get mounted now.

Esiy said:
I would have just preferred to see from the rescue shell what's the state of matters before letting it fully boot myself.

Alright, as you say. I'm obviously less experienced so I would rather follow your recommendations and instructions.

Deleted member 205422 · Nov 8, 2023

Pheggas said:
Alright, as you say. I'm obviously less experienced so I would rather follow your recommendations and instructions.

To be honest I just wanted to be something inbetween in terms of risk aversion between what you got advised from @fabian (one step at a time to boot and see) and @Dunuin (dd everything into an image before you start any forensics).

But to be fair, it's both good advice, we did not know what was your setup, you might have been troubleshooting hardware raid controller power loss event or just had a test setup you do not even care about if every VM is wiped.

BTW Did I get it right you were backing up your important VM via the TrueNAS VM onto the USB drives?

Pheggas · Nov 8, 2023

Esiy said:
BTW Did I get it right you were backing up your important VM via the TrueNAS VM onto the USB drives?

No. TrueNAS system has its configuration export feature and I were upgrading (more like clean installing new version) lately, I have newest configuration backed up so I'm able to have TrueNAS up and running in matter of minutes even with creation of VM for it.

The mission critical VM is the main one with all of my containers. I planned to do a backup system for it but I didn't have time unfortunately and I paid for it. If I will recover that VM, the first thing will be to download everything important via filezilla to my PC so the important stuff is backed up and then I will maybe try the VM backup process once again.

It's not EVEN THAT MUCH work to spin everything up again from scratch but it would definitely take few days and I would loose all the progress and even my site that I were hosting.

dd everything into an image before you start any forensics

I don't have any experience with dd so I would prefer Clonezilla with gui

Pheggas · Nov 8, 2023

@Dunuin sorry for unnecessary pinging but what is your view on this situation as the details has been provided? Could I maybe ask for some recommendations from your side as well?

I will give all the outputs from system as soon as I get home (in like 2 hours).

Search

Search

[SOLVED] [Critical issue] Proxmox won't boot and UEFI boot sequence takes minutes

Pheggas

New Member

Deleted member 205422

Guest

Pheggas

New Member

Deleted member 205422

Guest

Pheggas

New Member

Deleted member 205422

Guest

Pheggas

New Member

Pheggas

New Member

We value your privacy