Windows VMs stuck on boot after Proxmox Upgrade to 7.0

But this is not the problem with the thread. In my opinion, despite the setting "mitigations = off" and a firwmare update (latest version available from February 2022) the problem of spinning dots remains.
 
  • Like
Reactions: alexander@cloud
Very bad news.

UEFI firmware updated a few days ago.

Mitigations = OFF

Proxmox 7.2-4 completely updated.

Reboot VM and ... infinite spinning dots.

Could the virtio driver version be a cause? I use 0.1.215

This problem is consuming me .... every month we have 50 VMs in freeze, we cannot continue like this.
 
  • Like
Reactions: weehooey-bh
I agree with dea we can't continue like this. We have 100+ VMS that freezes every month. I too use 0.1.215 virto driver. I know proxmox acknowledges the issue, have they made any progress on fixing it?
 
  • Like
Reactions: weehooey-bh
Yes @dea and @yottabyteman I agree - there is still currently no solution even though the problem exists for months now. Please see my previous posts for details. Today all of our 30 Windows VMs stuck at boot again.

We are running the latstes PVE 7.2 on our dell poweredge R7525 servers. I have the same hardware running with PVE 6.4 and there are no problems ...
 
  • Like
Reactions: weehooey-bh
I can confirm, Proxmox 6.4 running on several hardware without problems !!! I run Proxmox 7.2 on Lenovo sr650 with very last UEFI firmware (10 June 2022) and there are problems....
 
  • Like
Reactions: weehooey-bh
@Moayad does Proxmox have any update on this glitch?

@Moayad any update would be appreciated.

We had been suspecting that the hung boot was only happening to each VM only once (at least in our situation). Unfortunately, this Patch Tuesday has led to a fresh round of hanging VMs.
 
@Moayad any update would be appreciated.

We had been suspecting that the hung boot was only happening to each VM only once (at least in our situation). Unfortunately, this Patch Tuesday has led to a fresh round of hanging VMs.
.. what do you mean for one time only? The problem occurs when the VM has been running for a certain number of days, whether there are updates or not. A VM that has been running for a month, without applying the slightest update on reboot exhibits the problem.
Does it only occur once? Ehhh yes, after you restart it, the problem does not appear anymore, wait a few weeks, restart it and then come back.
 
.. what do you mean for one time only? The problem occurs when the VM has been running for a certain number of days, whether there are updates or not. A VM that has been running for a month, without applying the slightest update on reboot exhibits the problem.
Does it only occur once? Ehhh yes, after you restart it, the problem does not appear anymore, wait a few weeks, restart it and then come back.
@dea until this week, we did not have any VMs that had this happen to more than once. Including those running for a long time (over 60 days) without a reboot. It was appearing that once it happened to a VM, it would not happen again. This was across multiple clusters and sites.

To be clear, this is not the case as of this week. We now have had multiple VMs that have hung more than once.
 
We're still trying to reproduce it here. On average how long does the VM need to run before it gets stuck on a reboot?
Can you reproduce this with some test VM where you can try snapshots (with RAM) and rollbacks? It would be interesting if it still gets stuck if you rollback to a snapshot directly before a reboot and then reboot the VM again.
 
  • Like
Reactions: weehooey-bh
@mira regarding the snapshots, the most recent comment in the bug report seems to cover that:

https://bugzilla.proxmox.com/show_bug.cgi?id=3933

@mira from our experience it is over 30 days. However, we only recently have had it happen multiple times to the same VM. Others with previous posts can likely give you a better idea.
 
  • Like
Reactions: alexander@cloud
I can reproduce this issue on my own cluster. I have a RMM tool that helps me to locate affected servers.

When I search for all Windows servers with 30+ days uptime and "reboot required" (due to installed updates) -> it always fails to reboot correctly. If I snapshot a VM and rollback, I can reproduce the issue over and over.

We run the 7,2,4 version of Proxmox, we have this issues since we updated. Our old version 6.x had no issues.

I will try some changes in the Windows VM that is affected and see if I can find some setting that resolves this issue, but I'm afraid it can't be fixed from within the Windows VM.
 
Last edited:
Also an important note:

I can rollback the VM and get the issue over and over. But if I stop the VM (stop the KVM process) and start the VM (fresh start) it doesn't happen again, even if I rollback to the snapshot when I was able to reproduce the issue.

So it has something to do with the runtime of the QEMU/KVM process.
 
Also an important note:

I can rollback the VM and get the issue over and over. But if I stop the VM (stop the KVM process) and start the VM (fresh start) it doesn't happen again, even if I rollback to the snapshot when I was able to reproduce the issue.

So it has something to do with the runtime of the QEMU/KVM process.
Thank goodness that we are getting closer to narrowing down the issue
 
I tried a few things within the Windows VM, but it doesn't fix anything.

Even when I do a hard reset in Proxmox, skipping the update on shutdown and startup, the VM won't boot.
 
Ok I'm 100% sure we can't fix this within Windows. I even tried to replace the OS disk with another (working) VM OS disk and boot. It wouldn't boot...

Even with the Windows Server 2012R2, 2016, 2019 and 2022 installation ISO's it won't boot. It will only boot to other ISO's, like HirensBootCD with MiniXP, Linux ISO's or older Windows Servers like 2008R2.

It just gets stuck on the Windows 2012R2-2022 logo and keeps circling with 0% CPU usage.
 
Last edited:
Ok I'm 100% sure we can't fix this within Windows. I even tried to replace the OS disk with another (working) VM OS disk and boot. It wouldn't boot...

Even with the Windows Server 2012R2, 2016, 2019 and 2022 installation ISO's it won't boot. It will only boot to other ISO's, like HirensBootCD with MiniXP, Linux ISO's or older Windows Servers like 2008R2.

It just gets stuck on the Windows 2012R2-2022 logo and keeps circling with 0% CPU usage.
Nice work, your every analysis coincides with mine.
I am left with one question, uncertain. Is there a link with the hardware? Are there systems that are affected by this problem and others that are not? I use Lenovo servers (quite a few) and they all have this problem. Other Lenovo systems that I use (other than) with version 6.4 DO NOT have this problem. But ... is the cause strictly QEMU (as I think) or is it a hardware (and consequently firmware) + QEMU combination?
 
  • Like
Reactions: weehooey-bh

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!