Windows VMs stuck on boot after Proxmox Upgrade to 7.0

FWIW, the "stuck" VM I left running did NOT magically recover when I did rolling upgrades the other night and migrated it to a node with pve-qemu-kvm 6.2, it just kept spinning after the migration. I will keep track of reboots now that we are all on pve-qemu-kvm 6.2.
You have to Stop & Start the VM after updating Qemu (in this particular case)
 
I'm using 6.2 now and the issue persists... and with new updates on ubuntu, I have the same issue now in linux.
Was that VM freshly booted or live migrated to another node after pve-qemu-kvm 6.2 has been installed?

Maybe I wasnt clear in my last post... When i go to reboot the machines (now linux or windows) the machine goes down to a black screen... but never shuts off and never goes into a reboot. I have to force stop the machine and then start it, it will then boot as normal. So if I let windows reboot after updates... its hung and I have to force the machine down.
That is how I see the problem. If the reboot is issued from within the VM, it will hang on either a black screen, the "Guest has not initialized the display (yet)" or for Windows it could also be the endless spinning circle. Stopping it and doing a clean start then works fine.


FWIW, the "stuck" VM I left running did NOT magically recover when I did rolling upgrades the other night and migrated it to a node with pve-qemu-kvm 6.2, it just kept spinning after the migration. I will keep track of reboots now that we are all on pve-qemu-kvm 6.2.
That would have been a surprise :). The hope is that the VMs will not get into that state with pve-qemu-kvm 6.2.
 
We have a PVE 7.1-12 running here in our Office which has an Uptime of 150 days now.
It has all Enterprise-Repository-Updates installed, except it has Kernel "Linux 5.13.19-1-pve #1 SMP PVE 5.13.19-3 (Tue, 23 Nov 2021 13:31:19 +0100)" running. Cause we did not do any reboot, only installed every update which was offered.

But this system only has planned Reboots every 180 days. Guess what... every single Windows 10/Windows Server VM and also an BSD12 OS rebooted inside the guest coming up without ANY issues. So maybe there is some correlation with Kernel AND QEMU-Version? Just a guess....
 
Last edited:
  • Like
Reactions: weehooey-bh
... and now with the release of Proxmox 7.2 the problem that has lasted more than 5 months will be solved? I really hope so...
Has still to be tested.... release is new....
 
Well I still have the same issue (fully upgraded on 7.2)... but i've found that if I shut down that seems to work ok... then I can just start from the console... but if I do a reboot from within the VM, it still hangs as before.
 
  • Like
Reactions: weehooey-bh
Well I still have the same issue (fully upgraded on 7.2)... but i've found that if I shut down that seems to work ok... then I can just start from the console... but if I do a reboot from within the VM, it still hangs as before.

@tkffaul have you had this happen more than once to each VM? We have not recorded the same VM having the problem more than once. However, we are not 100% confident that is the case.
 
@tkffaul have you had this happen more than once to each VM? We have not recorded the same VM having the problem more than once. However, we are not 100% confident that is the case.
I havent tried it a second time since the 7.2 upgrade. But prior to 7.2 yes I can replicate this issue over and over, mind you only since moving from 6 to 7. I'll do some more testing offline and let you know what I find.
 
I havent tried it a second time since the 7.2 upgrade. But prior to 7.2 yes I can replicate this issue over and over, mind you only since moving from 6 to 7. I'll do some more testing offline and let you know what I find.
@tkffaul you could replicate it?! Interesting. We have not been able to replicate it other than to upgrade a 6.x cluster to 7.x.

After the upgrade, all (most?) Windows VMs will hang on the first reboot. After they have had the failed reboot once, it does not appear to happen again to the same VM.

If you have had the same VM hang on reboot, I would really be interested to know what you have done to do that or your setup.
 
Clearly this is a big issue affecting lots of users. Is there any official word from Proxmox on an upcoming fix or anything? I was hopeful that 7.2 would address this. I think we all desperately want to get this resolved. We had issues last night with 2 clients that resulted in of course early AM support tickets.
 
Clearly this is a big issue affecting lots of users. Is there any official word from Proxmox on an upcoming fix or anything? I was hopeful that 7.2 would address this. I think we all desperately want to get this resolved. We had issues last night with 2 clients that resulted in of course early AM support tickets.
Yes it is a big problem. Some time ago we opened some official tickets to Proxmox, but it all faded into a ... "we can't replicate the problem" "try using the 5.15 kernel" "try using qemu 6.2". No deterministic answer.
 
I am not sure if it helps but maybe all of you should also follow the reported bug here:
https://bugzilla.proxmox.com/show_bug.cgi?id=3933

Maybe it gets some more "attention" by the proxmox-developers. Currently it is not even assigned to someone.
Maybe also someone from Proxmox-Support can remote into a system to view what is happening when reboot hangs.

We have around 5 to 10 VMs where we can reproduce without any "interruption" for our customers.... they just need to ask.... ;-)
 
  • Like
Reactions: weehooey-bh
I am not sure if it helps but maybe all of you should also follow the reported bug here:
https://bugzilla.proxmox.com/show_bug.cgi?id=3933

Maybe it gets some more "attention" by the proxmox-developers. Currently it is not even assigned to someone.
Maybe also someone from Proxmox-Support can remote into a system to view what is happening when reboot hangs.

We have around 5 to 10 VMs where we can reproduce without any "interruption" for our customers.... they just need to ask.... ;-)
@itNGO how do you reproduce it? We have some VMs too but have not found a way to reproduce the hang.
 
@itNGO how do you reproduce it? We have some VMs too but have not found a way to reproduce the hang.
Will upload a video where I show behavior and make some predictions when this happens....
I can with about 90% chance say which of our VMs will have this next. For me it looks that it is always with migrated HyperV-VMs and when these have at least 4 to 12 days uptime....

Video: https://owncloud.it-ngo.com/index.php/s/atbRmsY3dd3K6Km
 
  • Like
Reactions: weehooey-bh
Will upload a video where I show behavior and make some predictions when this happens....
I can with about 90% chance say which of our VMs will have this next. For me it looks that it is always with migrated HyperV-VMs and when these have at least 4 to 12 days uptime....

Video: https://owncloud.it-ngo.com/index.php/s/atbRmsY3dd3K6Km
Hey @itNGO thanks for the video.

Most of our VMs were built in PVE and have only experienced the hang once each (that we know of).

I just rebooted one that had 49 days of uptime and it rebooted fine. It had been one of the ones that had a problem earlier. Earlier today, rebooted another one with more than three weeks of uptime that had also had an issue.

What is interesting is the VMs we see hanging are always at the spinning dot screen. Yours was on a blank black screen (other people have reported that too).
 
Same here for our Windows VM's. Some reboot, some don't.

Windows Server 2019 pc-i440fx-5.2 (SeaBIOS), VirtIO SCSI
Windows Server 2022 pc-i440fx-6.0 (OVMF UEFI), VirtIO SCSI
Hey @tstrand have you had this issue recently? If so, was it the same VMs or different ones?
 
Will upload a video where I show behavior and make some predictions when this happens....
I can with about 90% chance say which of our VMs will have this next. For me it looks that it is always with migrated HyperV-VMs and when these have at least 4 to 12 days uptime....

Video: https://owncloud.it-ngo.com/index.php/s/atbRmsY3dd3K6Km
@itNGO are all the VMs that repeatedly freeze using version 6.0 or later of i440fx?

Code:
# qm showcmd <vmid> --pretty | grep machine
  -machine 'type=pc-i440fx-6.1+pve0' \

I noticed that in your video. Also, in PVE Roadmap I noticed under 6.4:
  • Support pinning a VM to a specific QEMU machine version.
  • Automatically pin VMs with Windows as OS type to the current QEMU machine on VM creation. This improves stability and guarantees that the hardware layout can stay the same even with newer QEMU versions.
I believe all of our machines that hung with the spinning wheel were pc-i440fx-5.1 but yours appears to be on version 6.0 or later.
 
The problem occurs with any version of qemu 6.x I do not use versions 5.x and the problem occurs anyway. On different clusters and with different storage systems, in the same way. Strangely I have never and again never had any problems with Windows 2008r2.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!