On a PVE system with the pve-qemu-kvm 11.0.0-1 package installed, virtual machines with the Hyper-V role added get stuck in a boot loop.

uzumo

Well-Known Member
Apr 5, 2025
528
151
48
I am currently testing pve-qemu-kvm 11.0.0-1 from the test repository to utilize hardware acceleration via MBEC/GMET on HVCI.


Virtual machines with the Hyper-V role added get stuck in a boot loop with the following combinations:

Code:
pve-qemu-kvm 11.0.0-1 ⁺ Linux 7.0.0-3-pve
pve-qemu-kvm 11.0.0-1 ⁺ Linux 7.0.2-1-pve

The system boots without issues in the following cases:

Code:
pve-qemu-kvm 11.0.0-1 ⁺ Linux 7.0.0-2-pve
pve-qemu-kvm 10.2.1-2 ⁺ Linux 7.0.0-3-pve
pve-qemu-kvm 10.2.1-2 ⁺ Linux 7.0.2-1-pve

A temporary workaround

Code:
apt install pve-qemu-kvm=10.2.1-2
apt-mark hold pve-qemu-kvm

It appears that there is an issue with the combination of a kernel that supports MBEC/GMET and QEMU 11.

*7.0.0-3-pve is the first kernel to which MBEC/GMET v3 was backported.
7.0.2-1-pve is a kernel with MBEC/GMET v5 backported.
Since version 7.0.0-2-pve does not include backports for MBEC/GMET, and since the issue does not occur when used in combination with that version, and since it does not occur with QEMU 10, we have determined that the issue occurs specifically when using a kernel that includes MBEC/GMET in combination with QEMU 11.

Is there anything you can provide to help us investigate this?
 
Last edited:
  • Like
Reactions: ikrsdo
Hi,
thank you for the report! My colleague @driley also ran into the issue and is looking into it.
 
  • Like
Reactions: uzumo
Hi,
As @fiona pointed out I ran into the exact same issue and narrowed it down to a couple of new additions in QEMU. For me, these two options seem to have caused the boot hang:
Code:
cet-ibt,cet-ss

On Kernel 7.0.2-1-pve
I got my Windows Server (with VBS active) to boot using the following setup:
Code:
args: -cpu host,level=30,-cet-ibt,-cet-ss

CPU: Intel(R) Xeon(R) Gold 6426Y

Keep me posted if this resolves the issue for you as well
 
  • Like
Reactions: uzumo and fiona
Thank you!!

I have confirmed that nested Hyper-V starts up normally.

I was also able to enable HVCI, and benchmark results do not appear to show any performance degradation.

*We have not observed any decline in CPU performance. However, we have noticed a noticeable drop in performance with GPU passthrough.

It is a bit of a hassle to have to apply this setting to each virtual machine that has the Hyper-V role added after the update.

However, since nested Hyper-V itself is not something we would run in a production environment, I can accept having to add this setting.

*As someone who is constantly testing, nested virtualization is an essential feature for me, but as long as it doesn’t stop working entirely, this is fine.

edit

There is an issue where simply applying the pve-qemu-kvm=11.0.0-1 update prevents the system from booting. This is because, while it might be tolerable if it only affected nested Hyper-V, enabling RDS or HVCI also prevents the OS from booting.

edit2

Even with pve-qemu-kvm=11.0.0-1,
you likely won’t encounter any issues when adding the RDS or Hyper-V role up to Windows Server 2022.
However, on Windows Server 2025, adding the RDS or Hyper-V role will prevent the system from booting.
 
Last edited:
  • Like
Reactions: Johannes S
@fiona

I’m curious to know what the current status is regarding this issue.

Has reducing the number of flags become the solution?

If that’s the case, I think it will become a problem unless they run some proper advertising, since I believe there are quite a few people using the RDS role in Windows Server 2025.

I am very disappointed that the patch has once again been released to the no-subscription repository without the issues being resolved.

*This is just me talking to myself, so I’m not looking to start a debate : If you are aware of the issue but are still releasing it to the no-subscription repository, what is the difference between that and the test repository? If you're going to publish it even if there are issues, wouldn't it be better to skip the “no subscription” stage and publish it directly to the enterprise repository?

 
Last edited:
Hi @uzumo,
the next Proxmox VE point release is coming up and we need to give QEMU 11 more widespread adoption to find regressions with different hardware/configurations/use cases in time before the release. I forgot to mention this issue in the QEMU 11 announcement thread and that was a mistake. Thank you for pointing that out! We will work on fixing or working around the issue here before the next release.

We do try to avoid issues in the pve-no-subscription repository too, but sometimes time constraints make that impossible. In any case, there are several workarounds for the issue.

It seems like you have wrong expectations about what the no-subsciption repository is. From the docs:
It can be used for testing and non-production use. It’s not recommended to use this on production servers, as these packages are not always as heavily tested and validated.
 
Yes, you’re right—from your perspective, users of both the Test Repository and the no-subscription Repository are likely testers who don’t generate revenue for you.
However, I had expected a bit more. Since users without a subscription cannot view the contents of the Enterprise Repository, they should encounter a critical bug at least once when trying to hold or pin something.

In fact, when I installed Qemu11 on a virtual machine added to a role like RDS (which, surprisingly, many people use) and rebooted it, the system crashed. Since we have an SLA in place, this will result in financial losses.

Well, since I’m using it for personal testing, I haven’t been affected, but others will likely run into trouble.

It’s just that my personal expectations were let down, so it’s not your fault—it’s just my own unrealistic expectations, and I’m the only one who’s disappointed. Please don’t worry about it.
 
I have PVE 9.1.11 and encountered the problem with one of my Windows 11 25H2 test VMs after I installed the recent PVE updates.
What helped to fix the problem was reinstalling the previous pve-qemu-kvm version.
Bash:
apt reinstall pve-qemu-kvm=10.2.1-2
apt-mark hold pve-qemu-kvm
After that and a reboot of PVE, the VM started just fine.
 
Last edited:
  • Like
Reactions: toreamun and ikrsdo
I have PVE 9.1.11 and encountered the problem with one of my Windows 11 25H2 test VMs after I installed the recent PVE updates.
What helped to fix the problem was reinstalling the previous pve-qemu-kvm version.
Bash:
apt reinstall pve-qemu-kvm=10.2.1-2
apt-mark hold pve-qemu-kvm
After that and a reboot of PVE, the VM started just fine.

This workaround worked for me. Thank you.
 
  • Like
Reactions: SelfMan
My assumption is, that QEMU 11 is not in pve-enterprise right now, correct?

In fact, when I installed Qemu11 on a virtual machine added to a role like RDS (which, surprisingly, many people use) and rebooted it, the system crashed. Since we have an SLA in place, this will result in financial losses.

Maybe I misunderstand you, but this host/system should currently not have seen QEMU 11 at all, because it, for sure, uses the pve-enterprise repository, no?
 
Sorry. Since I don't have the Enterprise edition, there was no way I could have known, so I deleted my previous post.

Anyone who can set up the Enterprise repository can run `apt policy pve-qemu-kvm` to see that it's not available.

*I don’t think they’ve arrived yet, but I think it’s best not to say whether they have or not, as that could raise contractual issues. After all, that’s a benefit of the subscription.
(Since this is only in the repository I use, I can only assume that it hasn’t arrived yet if you, who use the Enterprise Repository, aren’t affected.)

Code:
root@pve1:~# apt policy pve-qemu-kvm
pve-qemu-kvm:
  Installed: 11.0.0-2
  Candidate: 11.0.0-2
  Version table:
 *** 11.0.0-2 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
        100 /var/lib/dpkg/status
     11.0.0-1 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.2.1-2 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.2.1-1 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.1.2-7 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.1.2-6 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.1.2-5 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.1.2-4 500
        500 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages
        500 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     10.1.2-3 500

If you have a subscription (and are using the Enterprise Repository), you can check the no-subscription Repository directly to see the differences.

http://download.proxmox.com/debian/pve/dists/trixie/
 
Last edited:
That is not the main point of my post.
You said:
In fact, when I installed Qemu11 on a virtual machine added to a role like RDS (which, surprisingly, many people use) and rebooted it, the system crashed. Since we have an SLA in place, this will result in financial losses.
So my main point is:
Why does this PVE-host apparently not use the pve-enterprise repository, if a problem with it results in financial losses for you / your company?
Or even simpler:
In this case: Exclusive pve-enterprise repository usage = No QEMU 11 = No problem = No financial losses
 
  • Like
Reactions: UdoB
Well, since I’m using it for personal testing, I haven’t been affected, but others will likely run into trouble.

As stated, I am not troubled, so there is no reason for me to use enterprise. I am using the Test Repository to test features I am interested in testing new features, so there is no particular reason to use enterprise.

As long as I get the same results, I don’t have a strong preference for which virtualization platform to use. Tools are just tools, so as long as they work, it doesn’t matter which one I use.

I used to have an ESXi license, but since I can no longer use it, I gave up on it.
I switched to PVE because I found it very appealing that testers can use it without a subscription.

My job is to support the operating systems running on physical and virtual machines, not the virtualization platform itself. Therefore, as long as it’s possible, temporary downtime isn’t a particular issue for me.

However, if an update requires work that goes beyond the scope of my normal duties, I’ll simply migrate to a different virtualization platform.

*You might say, “Why not just buy a subscription?” but if that means sacrificing simple management tasks like updating via the web UI, I have absolutely no problem using the CLI instead.

If it’s unreasonable to expect anything other than the test repository to be unstable, then I’m the one who’s wrong. Please go ahead and add me to your ignore list right away. That will put my mind at ease.