Windows VMs stuck on boot after Proxmox Upgrade to 7.0

Hello,

Can anyone with a stuck VM provide us with the output for strace and GDB by issuing the following commands:

Bash:
cat /var/run/qemu-server/VM.pid | read PID | strace -p "$PID"
gdb attach -q $(cat /var/run/qemu-server/VM.pid) -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'
 
Last edited:
  • Like
Reactions: weehooey-bh
Hello,

Can anyone with a stuck VM provide us with the output for strace and GDB by issuing the following commands:

Bash:
cat /var/run/qemu-server/VM.pid | read PID | strace -p "$PID"
gdb attach -q $(cat /var/run/qemu-server/VM.pid) -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'
Sure, before reboot from inside guest? Or when it is already "stuck"?
 
  • Like
Reactions: weehooey-bh
When the VM got stuck - from PVE side please.

In addition, if possible, try to dumpscreen from the stuck VM, since the VM got a black screen we don't know if the System wants or waiting for an answer for something from a user like (confirmation to stop the Process/Service)

How can take a screenshot:

Bash:
~# qm monitor <VMID>
qm> screendump /tmp/screenVMID.ppm
 
  • Like
Reactions: weehooey-bh
When the VM got stuck - from PVE side please.

In addition, if possible, try to dumpscreen from the stuck VM, since the VM got a black screen we don't know if the System wants or waiting for an answer for something from a user like (confirmation to stop the Process/Service)

How can take a screenshot:

Bash:
~# qm monitor <VMID>
qm> screendump /tmp/screenVMID.ppm
Typically when the vm goes into stuck a black screen is not displayed but the Windows loading with the balls spinning and goes on indefinitely. The balls move, the vm is NOT in freeze, it just stays in that state indefinitely.
 
  • Like
Reactions: weehooey-bh
When the VM got stuck - from PVE side please.

In addition, if possible, try to dumpscreen from the stuck VM, since the VM got a black screen we don't know if the System wants or waiting for an answer for something from a user like (confirmation to stop the Process/Service)

How can take a screenshot:

Bash:
~# qm monitor <VMID>
qm> screendump /tmp/screenVMID.ppm
At that stage no request is made, no input is required. At least it's not visible.
 
  • Like
Reactions: weehooey-bh
Typically when the vm goes into stuck a black screen is not displayed but the Windows loading with the balls spinning and goes on indefinitely. The balls move, the vm is NOT in freeze, it just stays in that state indefinitely.
Not for everyone.... we just get black screen/not initialized....
 
  • Like
Reactions: weehooey-bh
Not for everyone.... we just get black screen/not initialized....
Okay, personally this has never happened to me. Always the same problem ... Reboot and balls that spin endlessly. Only on vm Windows, never on vm Debian or vm FreeBSD and always after a certain period (several days) that the vm is running. On different storage systems, on different Qemu settings and on at least 7 different clusters. Always the same problem.
 
When the VM got stuck - from PVE side please.

In addition, if possible, try to dumpscreen from the stuck VM, since the VM got a black screen we don't know if the System wants or waiting for an answer for something from a user like (confirmation to stop the Process/Service)

How can take a screenshot:

Bash:
~# qm monitor <VMID>
qm> screendump /tmp/screenVMID.ppm
@Moayad
Here are a gdb output, an strace output (I had to stop this manually, otherwise it would just keep growing and not stop), and a screendump.

Looking forward to getting this issue fixed. Let me know if you need something else
 

Attachments

  • Archive.zip
    14.2 KB · Views: 17
The last two windows updates with reboots succeeded. Even UEFI ones. Can't explain it. Only change is the regular Proxmox-enterprise updates.
The behavior has seemingly stopped for me too. I personally suspect the Microsoft kernel had some atrocious behavior going on, but I would still love to see a patch on Proxmox. No matter how much Microsoft poisons their own kernels they can't make me go back to Hyper-V.
 
  • Like
Reactions: weehooey-bh
Still a problem here.... even on patch-level 05/22 for Server 2016, 2019 and some Ubuntus....
 
Hello,

Can anyone with a stuck VM provide us with the output for strace and GDB by issuing the following commands:

Bash:
cat /var/run/qemu-server/VM.pid | read PID | strace -p "$PID"
gdb attach -q $(cat /var/run/qemu-server/VM.pid) -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'
Hi @Moayad please see the attached files.

This VM is currently hung. We can leave it for an hour or two before it needs to get it back into production.

Please let us know ASAP if there is something we can do further.

A few notes:
  • This is the first time we have seen a screen like this. Normally, it is Windows screen spinning balls. I suspect this one is different because this VM is Windows Server 2022, UEFI and is using the TPM feature. The behaviour is otherwise the same.
  • The strace command you provided errored. So I ran this instead after getting the PID:
    Bash:
    strace -p 1351768 1>strace_output_120.txt 2>&1
  • I was unclear how to capture the gdb output (it seemed to be interactive). So I ran it and copied the terminal to a file.
 

Attachments

  • 120.png
    120.png
    10.6 KB · Views: 20
  • gdb_output_120.txt
    3.3 KB · Views: 16
  • strace_output.tar.gz
    65 KB · Views: 10
Anyone already tested if 7.2-4 helps with the issues?
 
  • Like
Reactions: weehooey-bh
Sorry, but after several months I still don't understand one thing. Has the Proxmox support team (also the official one via ticket, since we also tried that route) NEVER replicated the problem? Every time we asked (via ticket) the response was "we know someone is complaining but we have never replicated the problem".

Possible ? I repeat it every month on multiple clusters ...
 
A real annoying problem with Windows.

Microsoft have this problem on their own systems too. So either they run Proxmox VE on Azure (I assume not, but who knows ...) or the issue is probably not related to the Proxmox stack.

Read more on https://docs.microsoft.com/en-us/tr...achines/troubleshoot-vm-boot-configure-update
In my opinion we should investigate further for several reasons ... The first ... I do not think that the vm in hung displays the word "getting Windows ready" but only presents the spinning balls. It might have a different meaning. The second reason is that with qemu 5.x the problem does not occur, as it does not occur in vmware. I'm not saying the culprit is Proxmox, but qemu 6.x, after all not only Proxmox uses qemu..
 
No, I strongly believe that the problem is different from the one highlighted in Azure.

First of all, the hung screen is different, in my opinion the point where the system stops is different. This is a 2016 post upgrade screen ... the balls are spinning ...
Again, I just updated several Windows systems (a few minutes ago) to a Proxmox 6.4 installation and not the slightest problem occurred. No, I am convinced that the problem is on QEMU 6.x or in any case an interaction between Windows and QEMU 6.x (since BSD and Linux work perfectly).

1653027894679.png
 

Attachments

  • 1653027876056.png
    1653027876056.png
    5.4 KB · Views: 0
A real annoying problem with Windows.

Microsoft have this problem on their own systems too. So either they run Proxmox VE on Azure (I assume not, but who knows ...) or the issue is probably not related to the Proxmox stack.

Read more on https://docs.microsoft.com/en-us/tr...achines/troubleshoot-vm-boot-configure-update
@tom completely different issue.... you are invited to view the issue on one of our DC-Clusters at anytime by remote-access for further debugging.
 
@tom completely different issue.... you are invited to view the issue on one of our DC-Clusters at anytime by remote-access for further debugging.
I am not that sure here, as the issue is not found yet.

For remote debugging:

Our enterprise team can assist here (just contact them via the well known channels)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!