[SOLVED] PVE Fails to load past 'Loading initial ramdisk...'

SOLVED!!

I discovered that my l2arc drive had failed, and was preventing the system from booting.

In order to address this, I had to shutdown, disconnect all the drives in the array, boot back up, and disable zfs mount:
systemctl disable zfs-mount.service

Once disabled, I shutdown, reconnected the drives, booted back up, then removed the failed cache drive:
zpool remove overz ata-SPCC_Solid_State_Disk_Pxxxxxxxxx

After re-enabling zfs mount and rebooting, PVE finally booted normally and saw the zpool. It was still experiencing some kernel panics, but after upgrading to PVE6, they have since stopped.

I can't say how relieved I am right now. This system (and the 10+ TB of data it hosts) was unusable for 9 months.

Absolute MASSIVE thank you to r.jochum for helping me with all this over Discord (shoutout to jonasled & ciken as well for their help too). This has been a dark cloud hanging over me for so long, thinking I might have lost my data, not knowing what to do....so I'm extremely grateful to have this system running again. THANK YOU GUYS!!
 
  • Like
Reactions: Mrkepler
SOLVED!!

I discovered that my l2arc drive had failed, and was preventing the system from booting.

In order to address this, I had to shutdown, disconnect all the drives in the array, boot back up, and disable zfs mount:
systemctl disable zfs-mount.service

Once disabled, I shutdown, reconnected the drives, booted back up, then removed the failed cache drive:
zpool remove overz ata-SPCC_Solid_State_Disk_Pxxxxxxxxx

After re-enabling zfs mount and rebooting, PVE finally booted normally and saw the zpool. It was still experiencing some kernel panics, but after upgrading to PVE6, they have since stopped.

I can't say how relieved I am right now. This system (and the 10+ TB of data it hosts) was unusable for 9 months.

Absolute MASSIVE thank you to r.jochum for helping me with all this over Discord (shoutout to jonasled & ciken as well for their help too). This has been a dark cloud hanging over me for so long, thinking I might have lost my data, not knowing what to do....so I'm extremely grateful to have this system running again. THANK YOU GUYS!!
Was your issue that your boot drive was corrupted? I'm currently having this issue and want to try your solution.
 
what do you mean by 'try the nomodeset kernel parameter' is this a setting on boot? where do i do this?
I did this in the grub launch parameters when booting. Before proxmox boots, hit E on the blue screen (after the BIOS screen) and you can edit the launch parameters.

boot-grub.png


The line you'll need to change is in the bottom half, and will very likely have "ro quiet" at the end. You simply append the parameters to the end of this line. Sorry for the slow response. Hope that helps.

Was your issue that your boot drive was corrupted? I'm currently having this issue and want to try your solution.
It was my ZFS read cache (l2arc) drive that died.

Unrelated to this topic, but this week I've been troubleshooting another failed ZFS pool (think I'm done with ZFS until I have a system with ECC memory), and I was able to get the system to boot by disabling ZFS related systemd modules (zfs-volume-wait, zfs-mount, zfs-import-cache). If this is your boot drive using ZFS... this probably won't help, but just an FYI.
 
I did this in the grub launch parameters when booting. Before proxmox boots, hit E on the blue screen (after the BIOS screen) and you can edit the launch parameters.

boot-grub.png


The line you'll need to change is in the bottom half, and will very likely have "ro quiet" at the end. You simply append the parameters to the end of this line. Sorry for the slow response. Hope that helps.


It was my ZFS read cache (l2arc) drive that died.

Unrelated to this topic, but this week I've been troubleshooting another failed ZFS pool (think I'm done with ZFS until I have a system with ECC memory), and I was able to get the system to boot by disabling ZFS related systemd modules (zfs-volume-wait, zfs-mount, zfs-import-cache). If this is your boot drive using ZFS... this probably won't help, but just an FYI.
If I don't have a ZFS read cache drive and only have 1 drive for Proxmox OS and 4 drives in raid 5, where would the issue be? Sorry I'm new to Proxmox. Nothing really changed from last known good boot compared to now other than switching the router and modem.
 
If I don't have a ZFS read cache drive and only have 1 drive for Proxmox OS and 4 drives in raid 5, where would the issue be? Sorry I'm new to Proxmox. Nothing really changed from last known good boot compared to now other than switching the router and modem.
Hard to say without any logs or more info. I will say that Proxmox has a number of dependencies that need to be met for it to properly boot, one of which is functioning network interfaces. I ran into an issue in the past where plugging in a 2nd GPU caused the host to be unreachable because the static network interface configuration changed due to the PCIe identifier changing the the interface name (going from enp2s0 to enp2s1). While I doubt changing your router would impact the host, it's worth checking out. I suggest opening a new thread with logs, screenshots, and the like to get more attention on your issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!