grub2-2.02-pve5 breaks on GPT/EFI disks (like ZFS roots)?

thanks for the clarification. it's very hard to get to the bottom of such issues without access to a system where the issue occurs..

I can understand that. The long boot is one thing (which may or may not be directly related to whatever was done going from pve4 to pve5), but the "unknown filesystem" should be somewhat more of a "trivial" thing to figure out why goes wrong?


when you boot with 2.02-pve5, does the hang occur before or after the grub menu is displayed?

After BIOS, before GRUB menu. We just see the "remaining" trail of output from "BIOS" (PXE-boot, RAID-stuff, etc), and then when it starts to boot, it just sits there, nothing more is displayed (not even a blinking cursor).


does it eventually boot (and just take "forever"), or did you revert by booting from a live-CD or similar?

It boots eventually. At first we thought it just would hang forever, so we spent some time trying to "fix" it (via Proxmox Rescue, etc). Then we just said "fuck it, lets go eat", and when we came back, it had booted. Looking at the logs, we could see that it had spent close to 40 minutes. We did a few more reboots, and every time it takes 30-60 minutes.


how was the system originally installed? if with the PVE installer, in which version?

PVE-installer. Not sure what version, but 4.x. Is there a way to "check" which version it was installed as?
 
So, we did a fresh/clean reinstall of the latest Proxmox (using the same disks). After rebooting, it's stuck in the same "wait 30-40 minutes before GRUB" scenario.

We're going to nuke the MBRs/whatnot, and try again, as we suspect that's whats causing the issue (which would explain why upgrading grub "broke" things).
 
Nuking the MBRs/whatnot didn't help. After doing that, followed by a new reinstall, we still got the "wait 30-40 minutes before GRUB" scenario.

In all cases (both initially, and now during "testing"), rpool was a raidz2 vdev consisting of 6 drives. The server used to have other drives, but they were removed after we started getting the "30-40 minute wait". As a last resort, we just tried to do a reinstall with two drives in a mirror vdev (2 out of the 6 drives previously in a raidz2 vdev) -- this seems to work just fine (as in that it boots "instantly"). It therefore seems to be either a) related to one of the 4 drives left out of the mirror, or b) related to raidz2 (alone, or in combination with the grub update and/or zfs and/or hardware).

As this happened after just doing a dist-upgrade, I'm pretty sure this isn't a hardware issue in itself (unless new grub/whatnot versions introduced some kind of incompatibility).
 
Nuking the MBRs/whatnot didn't help. After doing that, followed by a new reinstall, we still got the "wait 30-40 minutes before GRUB" scenario.

In all cases (both initially, and now during "testing"), rpool was a raidz2 vdev consisting of 6 drives. The server used to have other drives, but they were removed after we started getting the "30-40 minute wait". As a last resort, we just tried to do a reinstall with two drives in a mirror vdev (2 out of the 6 drives previously in a raidz2 vdev) -- this seems to work just fine (as in that it boots "instantly"). It therefore seems to be either a) related to one of the 4 drives left out of the mirror, or b) related to raidz2 (alone, or in combination with the grub update and/or zfs and/or hardware).

As this happened after just doing a dist-upgrade, I'm pretty sure this isn't a hardware issue in itself (unless new grub/whatnot versions introduced some kind of incompatibility).

that does sound really strange.. if you have the time, testing further combinations might narrow it down to specific disks, or numbers of disks, or raid level. e.g., the following sequence of tests (you don't need to wait the full 30-40 minutes of course, just wait long enough to make sure that it does not boot normally):
  • the remaining 2 pairs of disks as mirrors each (to find possible slots/disks as culprits)
  • some 4 disk combinations as raidz (if it is raidz)
  • some 4 disk combinations as raid1 / mirror (and the rest is just the amount of disks)
  • some 4 disk combinations as raid10 / striped mirrors
  • all 6 disks as raid1
  • all 6 disks as raid 10
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!