Proxmox stuck on "Loading initial ramdisk"

reynierpm

New Member
May 19, 2023
28
1
3
I have an nVME disk attached to a Dell T7910 through a PCIe interface. I did install Proxmox (the latest version 8.1-2) on such a disk since it is fast and it has plenty of space. At that moment the server had the following device configuration:
- CD/DVD ROM attached
- 4 x 10TB HDD

Installation was OK so I created some VMs - one of them being TrueNAS which holds those 4 x 10 TB HDD in an RAIDZ1 and I already copied a lot of info to them. Yesterday I stopped all the running VMs and shut down the server so I could remove the CD/DVD unit and use the very same SATA port. This one is an SSD 512G disk.

But surprise when I start the server and try to boot into PVE it hangs for hours just saying:

Loading Linux 6.5.11-8-pve ...
Loading initial ramdisk ...

At this point, I still cannot find the reason why this is happening. I tried adding `nomodeset` to the boot line to see if I could pass that and check if anything was happening but no luck.

I was able to get into the grub command line and gather some information, can you spot anything I cannot?
 

Attachments

  • IMG_0896-min.jpg
    IMG_0896-min.jpg
    950.2 KB · Views: 67
  • Like
Reactions: guidorb79
Have You already tried the following steps:

In GRUB: remove the value "quiet" in the line "linux" and boot again

linux /boot/vmlinuz-6.5.11-8-pve root=/dev/nzpper/pve-root ro quiet
 
I just ran into the same thing today. I woke this morning to the mini pc running Proxmox turned off. I fear the power flashed last night and this was not yet plugged into my battery backup. However, upon attempting to boot I am in the exact same situation. The only difference is I am just a version behind your vmlinuz with vmlinuz-6.5.11-7-pve.

So far I have updated the grub entry at boot with nomodeset and removed quiet and I am simply stuck with no more output on screen. I have also tried advanced options and the other vmlinux-5.4.11-4 that is still installed. I have loaded optimized defaults in bios and even run memtest for several hours. I have been able to boot an ubuntu live DVD with no issues on this mini-pc to confirm parted and lsblk can still read the nvme drive. This issue is very frustrating. Hard to troubleshoot with no further output being provided.

Please let me know if anyone has solved this or has some other troubleshooting steps I should try. It would be nice if I could restore this Proxmox box.
 
So it would seem in my issue is due to corruption on my nvme01np3 partition holding the LVM. I made an attempt from an ubuntu live environment to mount the system to see what is going on and vgscan refused to read anything on the disk.

Code:
WARNING: Unrecognised flag CROP_METADATA in segment type thin-pool+CROP_METADATA.
WARNING: Unrecognised segment type thin-pool+CROP_METADATA
LV pve/vm-1000-disk-0, segment 1 invalid: is not referencing thin pool LV for thin segment.
Internal error: LV segments corrupted in vm-1000-disk-0.
Cannot process volume group pve

vm-1000-disk-0 is from one of my primary server VMs. So it would seem I can't mount any of the logical volumes on this partition because this is corrupted.

I did attempt to use thin_check and thin_repair from the thin-provisioning-tools. thin_check when checking against /dev/nvme01np3 just tells me there is a bad superblock and thin_repair seems to only work if you have the volume group show up as a dev device or it is in /dev/mapper. Any attempts to use any of the lvm2 utilities just yields the same error above.

Since my metadata backups would be on the root logical volume on the lvm partition, I seem out of luck. I will continue to research this, but at this stage I have no idea how to go about repairing this.

For what it is worth, I did pull a clonezilla of the entire nvme drive before starting any of this work. So I am not too worried about messing it up.

Any help or suggestions would be appreciated.
 
I have some additional research and testing. I removed the nvme and put it in a usb enclosure attached to my laptop. I realize that the corruption is at the partition level. When running lsblk nothing can be read from blow the LVM partition. Here are the results showing the nvme usb enclosure as /dev/sda. I am able to get back that sda3 is an LVM2_member, can read its label, and get its UUID, but nothing else shows up under it.

Code:
$ lsblk --fs
NAME          FSTYPE      FSVER    LABEL     UUID                                   FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                               
├─sda1                                                                                             
├─sda2        vfat        FAT32              FAD4-E3C9                                             
└─sda3        LVM2_member LVM2 001           mO1cen-tk16-QcAH-45hN-EU7a-hGEF-H9f8lI

This likely explains the corruption messages when I run any of the pvscan, lvscan, vgscan, etc.

I attempted to use thin_dump to see if I could get some additional information, but it failed due to a bad superblock.

Code:
$ sudo thin_dump -f xml /dev/sda3 -o /tmp/test.dump
bad checksum in superblock, wanted 1344009915

I also attempted to use thin_ump -r to repair it but it reported I needed to specify a transaction id.

Code:
$ sudo thin_dump -r -f xml /dev/sda3 -o /tmp/test.dump
The following field needs to be provided on the command line due to corruption in the superblock: transaction id

At this point I have no other ideas on how to fix it. Searching the forum and google has led to loops. Anyone know how to repair the superblock? It would seem that something can be read due to my previous post indicating pve/vm-1000-disk-0 was corrupted, so I would think something can be done to repair the LVM2_member partition.
 
Same here...

Code:
Loading Linux 6.5.11-8-pve ...
Loading initial ramdisk ...

When I start using rescue mode, it gets stuck at the point of this screenshot:
IMG_2831.jpeg

I was able to get it to boot using the 8.8 installer and then upgrading. Upon upgrade it installed a 6.8 kernel. Now, the initial boot failed, but it successfully booted the rescue image and upon exiting loaded the full system. I have not tried a fresh reboot since, as I got side tracked by yet another issue... the fans are going full blast all the time.
 
I encountered same issues here with Proxmox v8.1.3 with HP Proliant DL380, is there any help or suggestion with this issues
 
O.K., that's sad...
In my case there was an error message, but it didn't show because of the quiet boot.

Sorry, but I can't help You any further at this point...:(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!