changing hardware settings of a VM with UEFI disk on Ceph breaks UEFI boot

Feb 3, 2021
8
1
8
44
Using a couple PCs as a test cluster, everything updated to latest version in both. I set up a Ceph from the GUI to create the storage for this cluster using drives inside these two PCs.

Creating new VMs with UEFI and setting the UEFI disk on a Ceph pool works, I can install Windows and reboot and it still works fine.

But as soon as I go and change a hardware setting of the VM like the Display (from Default to SPICE), or migrate the VM from a node to another node of the same cluster, something in the UEFI breaks.

When starting again the VM, it stops at UEFI firmware (CPU use is near zero and RAM use is around 60-70 MB), the console also shows the UEFI boot splash with the white progress bar at the bottom, Windows does not seem to take over.

The VM can remain in this state for more than 20 min and I can only kill it with "Stop" command from GUI.

Changing back the setting (so moving the Display from SPICE to Default) does not fix the issue, also moving the VM back to its original node does not fix it, the VM's UEFI is broken permanently.

On the same VM, if I delete the UEFI disk and create a new one in "local storage" (so it's not in the Ceph storage), it boots fine and I can change hardware settings without breaking it.

If I install a new VM with BIOS (SeaBIOS) firmware, also everything works fine, I can migrate and change settings without breaking it

Similar issues have been reported by others in this other thread https://forum.proxmox.com/threads/cant-start-vm-with-ovmf-and-uefi-disk-on-ceph.82367/#post-365709 although they had problems regardless of changing settings so maybe it is not the same issue.

ping to @Alwin since he responded to the other similar issue
 
  • Like
Reactions: need2gcm
I tested it in the same environment.
3 Node CEPH Cluster setup from the GUI with PVE6.1 when it was installed. Currently Version 6.3.3.

Created a VM with data and UEFI disk in CEPH Storage and installed debian 10 on it. Confirmed that it was UEFI.

My results we're than no matter what i tried in changing hardware when the VM was shutdown or online, nor migration in on or stopped state, it always worked flawlessly. I don't know if that helps, but at least i tried it and got a different result.
 
Thanks for the test.

Hm, so it seems the only difference is that my cluster only has 2 nodes, will see what happens when I add another node. Although it's weird, cluster size shouldn't matter.

Can you test with a Windows VM as well? I did mention I used Windows in my test above
 
Well, if you have only 2 nodes i hope you don't run HA.
And i hope your CEPH Pool is configured with size=2 and min_size=2, otherwise you're playing a dangerous game.

Which Windows product did u have installed? 10? Server 2019?
 
yes I configured ceph like that as this is a 2-node system, yes I know that I need an odd number of nodes if I want to run HA.
I'm not using HA and I think (hope) it's mostly irrelevant for this issue. This is just a test system to see if I can migrate from my current setup, and if all goes well with these two nodes, the third node will be my current KVM home server (that is not running Proxmox).

I installed Windows 10 Pro, as I've seen it's a good test of a latency-sensitive OS, with all the stuff going on in the background.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!