[SOLVED] PVE Fails to load past 'Loading initial ramdisk...'

Pragma808

Active Member
Nov 19, 2018
27
1
43
36
Hi all,

Recently my home PVE server has been unable to boot properly, hanging at 'Loading initial ramdisk...'. I've done some searches and followed what information I've been able to find, but as a rather inexperienced linux user, I'm at an impasse. It's been about a year since I set this server up, so I apologize for any lacking details, but I'm more than happy to report anything I can find out.

Some info:
- PVE version 5.x (not sure the exact version, but it is using linux 4.15.18-9-pve, as reported by grub).
- Boot device is an NVMe drive set up with LVM & ext2.
- Additional storage is a zRAID 5 consisting of 3 HDDs, a 2nd NVMe with windows installed on it, an optane write cache drive, and another ssd as a read cache. As far as I can recall, all boot critical information is on the boot SSD.
- A windows VM has been set up with GPU passthrough. The guide I followed had me blacklist all GPU drivers from loading, so I (stupidly) don't have a way to interact with the system directly, and in the past I've done everything over the network. I do have a 2nd GPU installed, but never could figure out how to whitelist is specifically.
- System does not boot to a point where it becomes reachable over the network.
- I've been able to boot directly into windows, where I've experienced intermittent issues with USB devices, but mostly functional. This could be a hardware issue, but I can't confirm that at this time.
- CPU: 9980xe w/ 128GB DDR4.

My troubleshooting steps thus far:
- Updated mobo BIOS. No change.
- Edited grub to remove 'quiet' option. Still hangs at 'Loading initial ramdisk' screen.
- Edited grub to remove 'quiet' and 'iommu=on' options. System now eventually loads past loading ramdisk screen, but only to display a black screen for ~1min before rebooting on its own.

Like I said before, I don't have a ton of experience with linux, so I'm basically stuck. Any help you all can provide would be greatly appreciated.

Thank you!
 
Hi,

try the nomodeset kernel parameter.
maybe then you get an error output.
 
Hi again. Apologies for the extended delay in my response. I finally had the chance to try the nomodeset kernel parameter, and was finally able to get a visual (THANK YOU).

The first thing I noticed was an error relating to a USB device, with some text warning that a cable might be bad. I shut the system down, unplugged everything but a keyboard, then tried again. That's when I got to this:

hung-at-zfs-job.jpg

It looks like it's hanging during the ZFS mount job. I let the system run for ~2 hours, but it never gets past this. I never actually get to the command line, unfortunately. Again, being rather inexperienced with linux and ZFS, I'm stuck here. Any ideas on what I can do next?

Thank you!!
 
Try the kernel of PVE 6.0
you can download it manual with wget and then install it.

Code:
wget http://download.proxmox.com/debian/pve/dists/buster/pve-no-subscription/binary-amd64/pve-kernel-5.3.13-2-pve_5.3.13-2_amd64.deb
dpkg -i [URL='http://download.proxmox.com/debian/pve/dists/buster/pve-no-subscription/binary-amd64/pve-kernel-5.3.13-2-pve_5.3.13-2_amd64.deb']pve-kernel-5.3.13-2-pve_5.3.13-2_amd64.deb[/URL]
 
Seeing as I haven't been able to get to a command line, how can I do this? Or do I need to do a clean install with v6.0? Thanks again :)
 
Try to boot from an old kernel.
At the boot time, you can choose it in the Grub widows.
 
I will need to verify, but I believe there is only 1 option listed. I never upgraded the kernal after getting the system up and running. Noobie, mistake, I know!

Thanks again.
 
No you have to install it manually.

Code:
wget http://download.proxmox.com/debian/pve/dists/stretch/pve-no-subscription/binary-amd64/pve-kernel-4.15.18-9-pve_4.15.18-30_amd64.deb

dpkg -i pve-kernel-4.15.18-9-pve_4.15.18-30_amd64.deb
 
Okay, but I'm not sure how to get to a command line, as it's hanging on the ZFS mount job during startup. Are there any other kernel parameters I can try?
 
The grub has an advanced menu filed.
Under this menu-point, you see all installed kernels.
If there is only one kernel you must use e to edit the kernel parameter to prevent importing the zfs pool
edit the line which starts with Linux
and add "zfs.zfs_autoimport_disable=1" at the end of the line.
the commands are separated by a space and remove the quotes.
 
Hi again,

So even with zfs.zfs_autoimport_disable=1, I found that the system is still trying to import the pool. I've attached a pic of what my grub looks like before loading the OS (grub1.jpg). One thing that may or may not be worth noting is that I noticed an extra space (a double space) in the launch parameters after "ro". I have no idea if this makes a difference, but I tried removing it and loading the OS, but it didn't seem to have an effect (grub2.jpg).

Either way, the system still tries to load the ZFS pool (load_os.jpg). Based on the launch parameters I have in the grubX.jpg screenshots, does anything appear to be incorrect?

I did a google search for zfs.zfs_autoimport_disable=1 and found a github thread reporting similar behavior (https://github.com/openzfs/zfs/issues/2474). If this is the same issue they were experiencing, I believe it might be due to my pool being cached in initramfs. Unfortunately, I have no idea what to do about that, seeing as I haven't even been able to get to the command line yet.

Any thoughts?

Thanks again. I really appreciate the support!
 

Attachments

  • grub1.jpg
    grub1.jpg
    161.5 KB · Views: 214
  • grub2.jpg
    grub2.jpg
    156.3 KB · Views: 188
  • load_os.jpg
    load_os.jpg
    287.1 KB · Views: 182
I've come to suspect that there is an issue with one of the drives in my RAID, as it (appears to) spit out errors when loading ZFS.

Here's a screenshot of this:

1600468465703.png


At this point, having been without my system for many months now, I'm not quite sure what the next step is. I think maybe rebuilding PVE from scratch is a good idea, but I don't know how to recover the data on the drives.

When testing booting the system with certain drives plugged/unplugged, I found that it's likely just a single drive causing the issue. Fortunately, I have a spare drive ready that I could plug in, but don't know how to resilver it or get the array functioning again.

Any help anyone can provide would be greatly appreciated. Thanks again!
 
Hmm you could try to boot an Ubuntu Desktop Live Version and see if it works with that (use the latest available it contains ZFS).

If so, you could then mount the pve root chroot in it upgrade and see if it works after.
 
Hmm you could try to boot an Ubuntu Desktop Live Version and see if it works with that (use the latest available it contains ZFS).

If so, you could then mount the pve root chroot in it upgrade and see if it works after.

Is there a guide I can reference for this process?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!