PVE no longer boots

triks

Member
Oct 17, 2022
37
3
8
I am hoping someone can help me diagnose an issue where my Proxmox server will no longer boot. It goes directly to the bios. The only device selected device to boot is my NVME drive.

I have used the Proxmox installation USB to get to the debug CLI and I can see my NVME Drive so I suspect it’s not a hardware issue.

I am unsure if the boot loader is active or working correctly. I’m also unable to determine if I’m using ZFS or grub to boot.

Any help would be much appreciated as I have spent weeks creating the virtual machines and configurations.
 

Attachments

  • IMG_6764.jpeg
    IMG_6764.jpeg
    627.5 KB · Views: 25
Last edited:
I am hoping someone can help me diagnose an issue where my Proxmox server will no longer boot. It goes directly to the bios. The only device selected device to boot is my NVME drive.

I am lost with your description above and your lsblk output, it appears you have no EFI partition on the NVMe drive, so how is that a boot drive?

I am unsure if the boot loader is active or working correctly. I’m also unable to determine if I’m using ZFS or grub to boot.

PVE uses GRUB to boot except when it uses systemd-boot in EFI non-secureboot scenario, ZFS is just filesystem, it has no bearing on what your bootloader is.

EDIT: Apparently the choice of bootloader for PVE only in case of ZFS non-SB is systemd-boot:
https://pve.proxmox.com/wiki/Host_Bootloader

Any help would be much appreciated as I have spent weeks creating the virtual machines and configurations.

The fastest is always to recover from backups.
 
Last edited:
thank you for your response, I agree not having a backup is a rookie mistake.

Nevertheless, it doesn't explain how/why the boot partition has disappeared.

Is there a method of restoring/recreating the boot partition in order to save the weeks of work invested in this setup?
 
thank you for your response, I agree not having a backup is a rookie mistake.

Nevertheless, it doesn't explain how/why the boot partition has disappeared.

Is there a method of restoring/recreating the boot partition in order to save the weeks of work invested in this setup?

I understand what you are after, but I am not sure what you have there, if you look at your output, there's 8G, 2G and ~500G partition on your NVMe.

None of this corresponds to any possible install from PVE ISO - they always create e.g. 1M BIOS boot partition (for compatibility), 512M ESP (for EFI) partition that will show as VFAT and then the rest.

I have no idea if there's (physically) e.g. more NVMe's in the system and you are looking at wrong one or what might have gone wrong, you have not described any further actions prior you got to this issue, but e.g. the raid member partition types alone would point to playing around with e.g. mdadm.

So you would need to provide more information on what was done to the system.
 
Last edited:
There are only 2 storage devices connected, 1 (sda) external USB Disk with the PVE installer and 1 NVME drive, 476GB which is sufficient for PVE Install and LXC Containers. I had deleted lvm-thin as I didn't need it.
 

Attachments

  • Proxmox Initial Setup.png
    Proxmox Initial Setup.png
    51 KB · Views: 8
The question is - how exactly did you install your PVE, originally? The current drive layout is not from a standard ISO install.
 
Also odd is that your listing shows the partitions in the NVMe as linux raid members.

It's why I do not want to suggest anything prior to knowing more, under normal circumstances, I would have thought I am looking at different drive than I think I am.
 
I retraced the steps I took during install and perhaps referencing this YouTube video and running these suggested commands is what cause the issue but it worked fine until reboot?

Perhaps it would be best to start from scratch as there are some serious issues by the sound of it.
 

Attachments

  • IMG_6766.jpeg
    IMG_6766.jpeg
    397.1 KB · Views: 10
Last edited:
I retraced the steps I took during install and perhaps referencing this YouTube video and running these suggested commands is what cause the issue but it worked fine until reboot?

Perhaps it would be best to start from scratch as there are some serious issues by the sound of it.

Is it this video?

https://youtu.be/_u8qTN3cCnQ?t=924

... this person does not understand ...

I think so, at around min 17 he literally wiped out the (for him empty) "pool" ... did you see the top comment (the one below the author's owned pinned one) thread under the video as well?

EDIT: I am quite positive you have done some more than just this to end up with raid members, but I am not able to guess.
 
Last edited:
Correct, that is the video I followed. It made sense to expand the size to 100% and delete lvl-thin as didn’t need the snapshot feature.

The device doesn’t have a screen usually connected to it, so all I know is the Hypervisor was suddenly not accessible (no ping etc) but was difficult to diagnose as one of the VMs is Pfsense which is the router/firewall for the network.

When I connected a screen it was in BIOS, at first I presumed the NVME died but after using the Proxmox installer debug CLI I was able to see it but no matter what I try it will not boot.
 
Correct, that is the video I followed. It made sense to expand the size to 100% and delete lvl-thin as didn’t need the snapshot feature.

If you did that before you started created all the VMs, that is alright, but you have something very different happening on that storage because it does not even follow PVE install structure.

The device doesn’t have a screen usually connected to it, so all I know is the Hypervisor was suddenly not accessible (no ping etc) but was difficult to diagnose as one of the VMs is Pfsense which is the router/firewall for the network.

I think you would be better off either taking the drive out and put it into another system to examine what's on those partitions or alternatively, e.g. get to boot LIVE Debian (I don't know what's at your disposal in that debug PVE provided one to be honest) and play forensics. I would start with mdadm.

When I connected a screen it was in BIOS, at first I presumed the NVME died but after using the Proxmox installer debug CLI I was able to see it but no matter what I try it will not boot.

I don't think you will get it to boot, I think you will be lucky if you get your VMs out from the big partition, to be honest.
 
thank you for your help, I was also curious what went wrong but will be more cautious this time around. Just couldn't waste any more downtime so decided to rebuild it from scratch.

I wasn't able to find a clear answer but is there any way to backup the HOST drive in full?
 

Attachments

  • Screenshot 2024-09-08 at 3.23.52 PM.png
    Screenshot 2024-09-08 at 3.23.52 PM.png
    164.8 KB · Views: 8
Last edited:
I wasn't able to find a clear answer but is there any way to backup the HOST drive in full?

Nothing supported out of the box, but it would go like for any Linux. I would like to point out that ideally, you do not really need to backup the whole host partition, there is nothing there of value to you that would not be in any new install.

What is valuable is data stored in /var/lib/pve-cluster/config.db - this holds what gets mounted during runtime into /etc/pve. Now like with any system, if you were to back this up while running, again there's no systemic support, even snapshotting your system partition while running could give you inconsistent config (there are other config.db* files in that path during runtime that are important).

So essentially I would backup that file alone when booted off separate system.
Alternatively you can always just dd [1] any drive, but again nothing on the host will be of much value to you, so it's rolling up to 96G in your case of backups instead of a few MBs. And for any customisation, I would rather use something like Ansible [2].

Note the links are quick searches online, they are not vetted by me, but will give you an idea what each tool is for, better than a plain man page.

[1] https://www.linux.com/topic/desktop/full-metal-backup-using-dd-command/
[2] https://spacelift.io/blog/ansible-tutorial
 
Last edited:
  • Like
Reactions: triks

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!