High VM-EXIT and Host CPU usage on idle with Windows Server 2025

SchorschSK · Tuesday at 10:56

Hi,

I can observe the same behaviour. Since Win11 24h2 (23h2 works fine), every vcpu added to a Windows 24h2 or newer VM will cause constant 3.5-4.5% cpu core load on the host when vm is mostly idle. I have tested this with several systems running Xeon Skylake and Cascade Lake dual socket servers and its the same everywhere. I have tested most CPU flags and CPU types.

Here is how it looks like with just one Win11 24h2 VM (12 vcpus) running and doing nothing:

On this particular machine 2 x Xeon 6230R , just having the VM idle, raises the power consumption (measured at wall), by 70 Watts. 70 Watts for an idle Windows...

it started with upgrade to Win11 24h2. Win11 25H2 is showing the same symptoms.

RoCE-geek · 2025-11-26T12:52:57+0100

I had a long post WIP about my detailed findings, but it's no more needed, as the proposed solution will not be effective (check the end).

So just a short recap: I can confirm that both platforms (AMD + Intel) are affected.

Since Win11-24H2 and Windows Server 2025, MS made a huge changes in terms of extensive abuse of Hyper-V "Enlightenments" calls.
This feature is here to Windows VM instances believe they're running on Hyper-V compatible hypervisor and can use some specific enhancements.
Up to Win11-23H2 and Window Server 2022, so far so good. No abuse, C3 states are working, Turbo Boost is happy.
But since the latest Windows builds, this complex HV framework around synthetic timers is "firing for effect".

So what's inside: When VM boots, WinOS is checking CPU flags like:

Code:

hv_ipi, hv_relaxed, hv_reset, hv_runtime, hv_spinlocks=0x1fff, hv_stimer, hv_synic, hv_time, hv_vapic, hv_vpindex

And when found ("Wow, I'm running on HV-compatible machine!"), it'll activate (now harmful) "bad boys", in our case especially Hyper-V synthetic timers (STIMER0) and Hyper-V APIC-EOI MSR for its timer/interrupt machinery on KVM – even when the VM is “idle”.

OK, but how to block it? I don't know, because it's activated automatically, based on Win-VM version and BIOS type (PVE/QEMU/CPU Config).

I wasn't able to deactivate this stuff in any way, because:
-- nothing like "hv=off" switch in CPU params (to enforce complete HV flags deactivation)
-- nothing like "args: -cpu host,-hv_ipi,-hv_relaxed,-hv_reset,-hv_runtime,-hv_spinlocks,-hv_stimer,-hv_synic,-hv_time,-hv_vapic,-hv_vpindex" worked for me

Based on this, I wasn't able to confirm that disabling synthetic HV timers will solve this problem, but now I know it will not.

I've found that either it's on Intel (MSR_WRITE), or on AMD (MSR), in both cases the final root cause is the same, i.e. extensive abuse of these Hyper-V Enlightenments calls:

STIMER0_CONFIG (0x400000b0)
STIMER0_COUNT (0x400000b1)
HV_X64_MSR_EOI (0x40000070)
HV_X64_MSR_ICR (0x40000071)

Some of the commands used:

Code:

perf kvm --host stat live

perf record -a -g -e kvm:kvm_exit -- sleep 5
perf script | grep -i msr | head -n 200

perf record -a -g -e kvm:kvm_msr -- sleep 5
perf script | grep -i msr | head -n 200

And then you'll see something like:

Code:

msr_write 400000b0 = 0x30008 / 0x3000a / ...
msr_write 400000b1 = <big changing values> / 0x43f8
msr_write 40000070 = 0x0
msr_write 40000071 = 0x2f / 0x40000000000002f / 0x20000000000002f

These are all Hyper-V synthetic MSRs:

0x40000070 → HV_X64_MSR_EOI
Fast APIC EOI MSR (End Of Interrupt).
0x40000071 → HV_X64_MSR_ICR
APIC ICR (Interrupt Command Register) in MSR form – used to send IPIs via Hyper-V.
0x400000B0 → HV_X64_MSR_STIMER0_CONFIG
Configuration of synthetic timer 0 (enable/periodic/oneshot, vector, etc.).
0x400000B1 → HV_X64_MSR_STIMER0_COUNT
Expiration time / counter for synthetic timer 0.

So the pattern is now clearly:

Configure synthetic timer 0 (msr_write 400000b0)
Program synthetic timer 0’s expiration time (msr_write 400000b1)
Signal End Of Interrupt via SynIC EOI MSR (msr_write 40000070)
Occasionally program the Hyper-V ICR (msr_write 40000071)

All of these are exclusively Hyper-V enlightenment paths – they don't exist on bare metal, only when Windows thinks it’s on a Hyper-V-capable hypervisor.

So if you've read this far, you're probably thinking: this guy decomposed the problem, so it should be good!

Yes and not, because @lovesyk (author of that QEMU thread) revealed here, that: Disabling some of those flags like stimer will however increase idle CPU consumption by more than 24H2 does so even if some of those are able to work around whatever changed in 24H2, I'll probably be a net loss in the end regarding CPU load.

Based on this, I am giving up on my further research. I don't know what else to explore or how to do it. I'm sorry.

I think now it's your turn, I cannot help you more: @fweber, @t.lamprecht, @fabian, @aaron, @fiona

CC (just for curiosity): @benyamin, @JonKohler

benyamin · 2025-11-26T14:35:38+0100

Did anyone see if issuing echo Y > /sys/module/kvm/parameters/ignore_msrs has any effect on the reported problem?

If so, adding a modprobe could be a workaround...

RoCE-geek · 2025-11-26T14:44:59+0100

benyamin said:
Did anyone see if issuing echo Y > /sys/module/kvm/parameters/ignore_msrs has any effect on the reported problem?

If so, adding a modprobe could be a workaround...

At least on my side, such runtime change doesn't influence the occurrence of MSR exits, sadly...

benyamin · 2025-11-26T15:13:56+0100

RoCE-geek said:
At least on my side, such runtime change doesn't influence the occurrence of MSR exits, sadly...

Just to be clear, the VM should be cold booted after the change.

RoCE-geek · 2025-11-26T15:18:46+0100

benyamin said:
Just to be clear, the VM should be cold booted after the change.

Yeah, agreed, no effective changes. I'm not sure if this settings are working, as I recently found some threads with such user complains.

RoCE-geek · 2025-11-26T18:50:57+0100

RoCE-geek said:
Yeah, agreed, no effective changes. I'm not sure if this settings are working, as I recently found some threads with such user complains.

Just checked it in detail and it cannot work by design:

ignore_msrs=1 (as a KVM or kvm_intel/kvm_amd module parameter) controls what KVM does when the guest accesses an MSR that KVM doesn’t know how to handle.

With ignore_msrs=0 (usually default):
• An unknown rdmsr/wrmsr causes a KVM warning and usually injects a #GP (GPF) to the guest, sometimes even killing it if it happens in a bad place.
With ignore_msrs=1:
• Unknown MSR accesses are silently ignored (returning 0 or discarding the write), so guests that poke at random MSRs don’t crash the VM.

Key point: it affects only MSRs that KVM has no handler for, but that's not our case, as we know all of them (I mean those I listed).

RoCE-geek · 2025-11-26T19:59:43+0100

RoCE-geek said:
I had a long post WIP about my detailed findings, but it's no more needed, as the proposed solution will not be effective (check the end).

So just a short recap: I can confirm that both platforms (AMD + Intel) are affected.

Since Win11-24H2 and Windows Server 2025, MS made a huge changes in terms of extensive abuse of Hyper-V "Enlightenments" calls.
This feature is here to Windows VM instances believe they're running on Hyper-V compatible hypervisor and can use some specific enhancements.
Up to Win11-23H2 and Window Server 2022, so far so good. No abuse, C3 states are working, Turbo Boost is happy.
But since the latest Windows builds, this complex HV framework around synthetic timers is "firing for effect".

So what's inside: When VM boots, WinOS is checking CPU flags like:

Code:

hv_ipi, hv_relaxed, hv_reset, hv_runtime, hv_spinlocks=0x1fff, hv_stimer, hv_synic, hv_time, hv_vapic, hv_vpindex

And when found ("Wow, I'm running on HV-compatible machine!"), it'll activate (now harmful) "bad boys", in our case especially Hyper-V synthetic timers (STIMER0) and Hyper-V APIC-EOI MSR for its timer/interrupt machinery on KVM – even when the VM is “idle”.

OK, but how to block it? I don't know, because it's activated automatically, based on Win-VM version and BIOS type (PVE/QEMU/CPU Config).

I wasn't able to deactivate this stuff in any way, because:
-- nothing like "hv=off" switch in CPU params (to enforce complete HV flags deactivation)
-- nothing like "args: -cpu host,-hv_ipi,-hv_relaxed,-hv_reset,-hv_runtime,-hv_spinlocks,-hv_stimer,-hv_synic,-hv_time,-hv_vapic,-hv_vpindex" worked for me

Based on this, I wasn't able to confirm that disabling synthetic HV timers will solve this problem, but now I know it will not.

I've found that either it's on Intel (MSR_WRITE), or on AMD (MSR), in both cases the final root cause is the same, i.e. extensive abuse of these Hyper-V Enlightenments calls:

STIMER0_CONFIG (0x400000b0)

STIMER0_COUNT (0x400000b1)

HV_X64_MSR_EOI (0x40000070)

HV_X64_MSR_ICR (0x40000071)

Some of the commands used:

Code:

perf kvm --host stat live perf record -a -g -e kvm:kvm_exit -- sleep 5 perf script | grep -i msr | head -n 200 perf record -a -g -e kvm:kvm_msr -- sleep 5 perf script | grep -i msr | head -n 200

And then you'll see something like:

Code:

msr_write 400000b0 = 0x30008 / 0x3000a / ... msr_write 400000b1 = <big changing values> / 0x43f8 msr_write 40000070 = 0x0 msr_write 40000071 = 0x2f / 0x40000000000002f / 0x20000000000002f

These are all Hyper-V synthetic MSRs:

0x40000070 → HV_X64_MSR_EOI
Fast APIC EOI MSR (End Of Interrupt).

0x40000071 → HV_X64_MSR_ICR
APIC ICR (Interrupt Command Register) in MSR form – used to send IPIs via Hyper-V.

0x400000B0 → HV_X64_MSR_STIMER0_CONFIG
Configuration of synthetic timer 0 (enable/periodic/oneshot, vector, etc.).

0x400000B1 → HV_X64_MSR_STIMER0_COUNT
Expiration time / counter for synthetic timer 0.

So the pattern is now clearly:

Configure synthetic timer 0 (msr_write 400000b0)

Program synthetic timer 0’s expiration time (msr_write 400000b1)

Signal End Of Interrupt via SynIC EOI MSR (msr_write 40000070)

Occasionally program the Hyper-V ICR (msr_write 40000071)

All of these are exclusively Hyper-V enlightenment paths – they don't exist on bare metal, only when Windows thinks it’s on a Hyper-V-capable hypervisor.

So if you've read this far, you're probably thinking: this guy decomposed the problem, so it should be good!

Yes and not, because @lovesyk (author of that QEMU thread) revealed here, that: Disabling some of those flags like stimer will however increase idle CPU consumption by more than 24H2 does so even if some of those are able to work around whatever changed in 24H2, I'll probably be a net loss in the end regarding CPU load.

Based on this, I am giving up on my further research. I don't know what else to explore or how to do it. I'm sorry.

I think now it's your turn, I cannot help you more: @fweber, @t.lamprecht, @fabian, @aaron, @fiona

CC (just for curiosity): @benyamin, @JonKohler

And just for the record, if I "hack" the OS type (aka change OS from "Windows" to e.g. "Other"), the HVE buzz is gone, but we have another problems:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

       EPT_MISCONFIG       7074    96.86%    66.78%      1.01us  18791.88us    104.62us ( +-  11.00% )
                 HLT         85     1.16%    31.88%      0.68us  20826.13us   4156.76us ( +-   8.36% )
         EOI_INDUCED         50     0.68%     0.90%      0.72us   5012.68us    199.90us ( +-  69.24% )
  EXTERNAL_INTERRUPT         45     0.62%     0.43%      0.71us   4512.42us    106.07us ( +-  94.41% )
          APIC_WRITE         23     0.31%     0.00%      0.69us      4.95us      1.79us ( +-  11.85% )
       EPT_VIOLATION         20     0.27%     0.00%      0.94us      4.38us      2.37us ( +-   9.98% )
   PAUSE_INSTRUCTION          6     0.08%     0.00%      0.69us      1.66us      1.24us ( +-  13.25% )

So as you can see, here is the new "leader", but even more aggressive (EPT_MISCONFIG), so neither solution, nor mitigation. This is not the way.

RoCE-geek · 2025-11-26T23:41:15+0100

RoCE-geek said:
I had a long post WIP about my detailed findings, but it's no more needed, as the proposed solution will not be effective (check the end).

So just a short recap: I can confirm that both platforms (AMD + Intel) are affected.

Since Win11-24H2 and Windows Server 2025, MS made a huge changes in terms of extensive abuse of Hyper-V "Enlightenments" calls.
This feature is here to Windows VM instances believe they're running on Hyper-V compatible hypervisor and can use some specific enhancements.
Up to Win11-23H2 and Window Server 2022, so far so good. No abuse, C3 states are working, Turbo Boost is happy.
But since the latest Windows builds, this complex HV framework around synthetic timers is "firing for effect".

So what's inside: When VM boots, WinOS is checking CPU flags like:

Code:

hv_ipi, hv_relaxed, hv_reset, hv_runtime, hv_spinlocks=0x1fff, hv_stimer, hv_synic, hv_time, hv_vapic, hv_vpindex

And when found ("Wow, I'm running on HV-compatible machine!"), it'll activate (now harmful) "bad boys", in our case especially Hyper-V synthetic timers (STIMER0) and Hyper-V APIC-EOI MSR for its timer/interrupt machinery on KVM – even when the VM is “idle”.

OK, but how to block it? I don't know, because it's activated automatically, based on Win-VM version and BIOS type (PVE/QEMU/CPU Config).

I wasn't able to deactivate this stuff in any way, because:
-- nothing like "hv=off" switch in CPU params (to enforce complete HV flags deactivation)
-- nothing like "args: -cpu host,-hv_ipi,-hv_relaxed,-hv_reset,-hv_runtime,-hv_spinlocks,-hv_stimer,-hv_synic,-hv_time,-hv_vapic,-hv_vpindex" worked for me

Based on this, I wasn't able to confirm that disabling synthetic HV timers will solve this problem, but now I know it will not.

I've found that either it's on Intel (MSR_WRITE), or on AMD (MSR), in both cases the final root cause is the same, i.e. extensive abuse of these Hyper-V Enlightenments calls:

STIMER0_CONFIG (0x400000b0)

STIMER0_COUNT (0x400000b1)

HV_X64_MSR_EOI (0x40000070)

HV_X64_MSR_ICR (0x40000071)

Some of the commands used:

Code:

perf kvm --host stat live perf record -a -g -e kvm:kvm_exit -- sleep 5 perf script | grep -i msr | head -n 200 perf record -a -g -e kvm:kvm_msr -- sleep 5 perf script | grep -i msr | head -n 200

And then you'll see something like:

Code:

msr_write 400000b0 = 0x30008 / 0x3000a / ... msr_write 400000b1 = <big changing values> / 0x43f8 msr_write 40000070 = 0x0 msr_write 40000071 = 0x2f / 0x40000000000002f / 0x20000000000002f

These are all Hyper-V synthetic MSRs:

0x40000070 → HV_X64_MSR_EOI
Fast APIC EOI MSR (End Of Interrupt).

0x40000071 → HV_X64_MSR_ICR
APIC ICR (Interrupt Command Register) in MSR form – used to send IPIs via Hyper-V.

0x400000B0 → HV_X64_MSR_STIMER0_CONFIG
Configuration of synthetic timer 0 (enable/periodic/oneshot, vector, etc.).

0x400000B1 → HV_X64_MSR_STIMER0_COUNT
Expiration time / counter for synthetic timer 0.

So the pattern is now clearly:

Configure synthetic timer 0 (msr_write 400000b0)

Program synthetic timer 0’s expiration time (msr_write 400000b1)

Signal End Of Interrupt via SynIC EOI MSR (msr_write 40000070)

Occasionally program the Hyper-V ICR (msr_write 40000071)

All of these are exclusively Hyper-V enlightenment paths – they don't exist on bare metal, only when Windows thinks it’s on a Hyper-V-capable hypervisor.

So if you've read this far, you're probably thinking: this guy decomposed the problem, so it should be good!

Yes and not, because @lovesyk (author of that QEMU thread) revealed here, that: Disabling some of those flags like stimer will however increase idle CPU consumption by more than 24H2 does so even if some of those are able to work around whatever changed in 24H2, I'll probably be a net loss in the end regarding CPU load.

Based on this, I am giving up on my further research. I don't know what else to explore or how to do it. I'm sorry.

I think now it's your turn, I cannot help you more: @fweber, @t.lamprecht, @fabian, @aaron, @fiona

CC (just for curiosity): @benyamin, @JonKohler

As a last resort, I decided to patch /usr/share/perl5/PVE/QemuServer/CPUConfig.pm

And it seems... after some experiments, maybe I can say I've found some workaround, which may help somehow.

But I'm sorry, as it's too late, so briefly:

Deactivation of hv_stimer was successful, but while MSR_WRITE was lowered, another leader showed up: APIC_WRITE (with a little higher occurrence).

So with hv_stimer disabled:
• Windows has to fall back to more “classic” timers:
• LAPIC timer / TSC-deadline
• And in some Windows versions, a lot more HPET or even RTC activity if synthetic timers are missing.
QEMU’s own doc warns: "It is known that certain Windows versions revert to using HPET (or even RTC when HPET is unavailable) extensively when this enlightenment is not provided; this can lead to significant CPU consumption, even when virtual CPU is idle."

So what I did is basically: Trade MSR writes to Hyper-V timer MSRs for writes to LAPIC / timer registers → APIC_WRITE exits.

In other words: I moved the hot path from “synthetic timer MSR” to “local APIC timer”, but the timer frequency / wake-ups are still crazy, so the host still can’t reach deep C-states. Hence idle load feels the same.

So I decided for another way: let all HV flags active, and add one more - hv-apicv

Right now Proxmox doesn’t add this flag by default, so let's do the "bad things":

Code:

     if ($winversion >= 7) {
         my $win7_reason = $default_reason . " 7 and higher";
         $flagfn->('hv_relaxed', undef, $win7_reason);

         if (qemu_machine_feature_enabled ($machine_type, $kvmver, 2, 12)) {
             $flagfn->('hv_synic', undef, $win7_reason);
             $flagfn->('hv_stimer', undef, $win7_reason);
+            $flagfn->('hv_apicv', undef, $win7_reason);
         }
         if (qemu_machine_feature_enabled ($machine_type, $kvmver, 3, 1)) {
             $flagfn->('hv_ipi', undef, $win7_reason);
         }
     }

And the result? Extensive cut of MSR_WRITE exits (approx by half, i.e. close to Win11-23H2 / WS2022 baselines), without any visible tradeoff.

CC @fweber, @t.lamprecht, @fabian, @aaron, @fiona

nodoame · 2025-11-27T03:58:25+0100

Did anyone see if issuing echo Y > /sys/module/kvm/parameters/ignore_msrs has any effect on the reported problem?

Unfortunately, no change for me either...

And the result? Extensive cut of MSR_WRITE exits (approx by half, i.e. close to Win11-23H2 / WS2022 baselines), without any visible tradeoff.

Close to 23H2 with that flag enabled as well? There should be a similar improvement on the older versions if enabled, so any comparison needs to be done using the same settings.
The tests I posted on the previous page were done using all enlightenments enabled for my system which includes APICv.

fiona · 2025-11-27T09:58:30+0100

RoCE-geek said:
So I decided for another way: let all HV flags active, and add one more - hv-apicv

Thank you for the investigation! Yes, it can make sense to add more Hyper-V CPU flags for Windows guests. There are other flags too, see: https://bugzilla.proxmox.com/show_bug.cgi?id=7021

But we can only do that this starting with a new machine version (i.e. add a version guard check like the others already have) to not break live migration, because live migration requires that the virtual hardware matches exactly.

EDIT: I filed https://bugzilla.proxmox.com/show_bug.cgi?id=7088 to better keep track of this.

RoCE-geek · 2025-11-27T11:52:20+0100

Close to 23H2 with that flag enabled as well? There should be a similar improvement on the older versions if enabled, so any comparison needs to be done using the same settings.
The tests I posted on the previous page were done using all enlightenments enabled for my system which includes APICv.

@nodoame of course, you're right. I was just sharing my the only one "partially optimistic" finding at the end of the day. But if you've already done all the tests with all the HVE flags (inc. hv-apicv), then my contribution is just a minor detail.

Thank you for the investigation! Yes, it can make sense to add more Hyper-V CPU flags for Windows guests. There are other flags too, see: https://bugzilla.proxmox.com/show_bug.cgi?id=7021

But we can only do that this starting with a new machine version (i.e. add a version guard check like the others already have) to not break live migration, because live migration requires that the virtual hardware matches exactly.

EDIT: I filed https://bugzilla.proxmox.com/show_bug.cgi?id=7088 to better keep track of this.

Thank you @fiona for that. I'd suggest make (after the required tests) these additional HVE flags optional, so everyone can check/try if such changes are beneficial in their specific workload/use-case. But indeed, there may be some potentially positive impact of such added flags.

But for both of you (and for others), with the fresh mind, I retested the both cases (default flags vs added hv-apicv), and is almost 50:50, as the burden has simply been redistributed, which is not (automatically) a bad thing.

Default case (PVE 8.4, WS2025 VM, 8vCPUs, 16GB vRAM, Intel(R) Xeon(R) Gold 6244, OS-Type: Win11/22/25, CPU-Type: x86-64-v4), default flags:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       1551    66.65%    21.26%      0.73us  55505.13us    271.21us ( +-  31.84% )
                 HLT        750    32.23%    77.92%     48.74us  58756.62us   2055.31us ( +-   4.71% )
  EXTERNAL_INTERRUPT         23     0.99%     0.01%      4.71us     20.90us     12.12us ( +-   7.47% )
       EPT_MISCONFIG          2     0.09%     0.80%     34.28us  15728.67us   7881.47us ( +-  99.57% )
              VMCALL          1     0.04%     0.00%     11.52us     11.52us     11.52us ( +-   0.00% )

Total Samples:2327, Total events handled time:1978177.99us.

Patched case (PVE 8.4, WS2025 VM, 8vCPUs, 16GB vRAM, Intel(R) Xeon(R) Gold 6244, OS-Type: Win11/22/25, CPU-Type: x86-64-v4), added hv-apicv:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE        889    38.82%    12.63%      0.75us  60269.75us    248.13us ( +-  39.91% )
         EOI_INDUCED        690    30.13%    10.82%      0.39us  72704.41us    274.00us ( +-  53.51% )
                 HLT        683    29.83%    75.07%     55.99us  36206.92us   1919.71us ( +-   3.51% )
  EXTERNAL_INTERRUPT         20     0.87%     0.01%      1.72us     40.99us     12.27us ( +-  15.79% )
      IO_INSTRUCTION          4     0.17%     0.00%     10.71us     11.01us     10.78us ( +-   0.70% )
       EPT_MISCONFIG          2     0.09%     1.45%     39.99us  25357.09us  12698.54us ( +-  99.69% )
   PAUSE_INSTRUCTION          1     0.04%     0.00%      8.61us      8.61us      8.61us ( +-   0.00% )
          APIC_WRITE          1     0.04%     0.00%      4.75us      4.75us      4.75us ( +-   0.00% )

Total Samples:2290, Total events handled time:1746506.68us.

Technical note: As cat /sys/module/kvm_intel/parameters/enable_apicv = Y, I expect that Intel APICv is supported on this CPU.

So it's clear that with added hv-apicv, high total MSR_WRITE from default case was just split into lower MSR_WRITE + added EOI_INDUCED

As a result, total VM-EXITS remain almost the same (I always pick the lowest perf kvm --host stat live values from 10-15s time window).

Technically it's not bad, especially in some high-density deployments (or special use-cases), such VM-EXITS redistribution may be beneficial in some way.

So let's do a quick recap:

-- I've demonstrated that it's possible to deactivate synthetic HV timers (aka -hv_stimer).
But is is helpful? No, as another redistribution happened - interchange between MSR_WRITE and APIC_WRITE

-- I've also demonstrated that it's possible to activate APICv (aka +hv_apicv).
But is is helpful? Yes and no, it depends on use-case/workload, as it "just" compensate high MSR_WRITE with EOI_INDUCED

Long story short: no real break-through so far.

fiona · 2025-11-27T12:11:51+0100

RoCE-geek said:
Thank you @fiona for that. I'd suggest make (after the required tests) these additional HVE flags optional, so everyone can check/try if such changes are beneficial in their specific workload/use-case. But indeed, there may be some potentially positive impact of such added flags.

Sure, for some flags this can be better if they are not generally beneficial in all cases. I added: https://bugzilla.proxmox.com/show_bug.cgi?id=7088#c1

fweber · 2025-11-27T18:05:10+0100

Hi! If I understand correctly, the issue originally reported in this thread is that

Windows 11 24H2/25H2 / Windows Server 2025 VMs, when idle, consistently show a higher CPU usage (around 3-10%) on the host than expected, as visible in the VM summary screenshots by @orange [1] and @ksb [2]
This can be problematic because it leads to higher power consumption, even though the VMs are just idling, as reported by @SchorschSK [3].
Windows 2022 and Windows 11 23H2 / Windows 10 22H2 are apparently unaffected, meaning that they show a lower CPU usage in the VM summary page when idle (I suppose something like < 1%)
It's not entirely clear whether Intel and AMD hosts are both affected -- most reports in this thread seem to be from Intel hosts?

Please let me know If I got something wrong here.

I tried to reproduce the issue on an 2x Intel(R) Xeon(R) Gold 6426Y today -- created a Windows Server 2025 VM, fully updated it, let it idle (only logged in via VNC console and did nothing else), and monitored their CPU usage as displayed in the VM summary page. Except for short spikes (during updates and such), the CPU usage is consistently displayed as well below 1%. So I think I'm not seeing the reported issue yet (even though it's often hard to tell with "quantitative" issues like these, i.e., it's not exactly clear where's the line between idle CPU usage that is still OK and idle CPU usage that is too high).

So there might be a factor that I'm missing in order to reproduce the issue. @orange has already provided the VM configurations of the two affected VMs [5], and mentioned they are testing on an 2 x Intel Xeon Gold 6134 [5], but getting more data would help to spot potential similarities betwen the setups.

Could others who are affected by the increased CPU usage issue please also post

the respective VM configs (qm config VMID --current)
host CPU models
screenshots of the CPU usage graphs in the VM summary page

As was pointed out [6], if the CPU type is set to "host" (or "nested-virt", which is available since Proxmox VE 9.1) is available, nested virtualization is enabled, and VBS is enabled and running within the VM, performance issues are currently to be expected, so let's focus on the cases where increased CPU usage is visible even when VBS is not active.

@RoCE-geek, if I understand correctly, your investigation focused on the number and cause of the VM exits and the influence of the Hyper-V enlightenments on these VM exits -- and thank you for that, I see that Fiona already opened a feature request for reevaluating the Hyper-V enlightenments. However, I'm not sure whether your hosts are also affected by the increased CPU usage issue summarized above (you initially mentioned it's hard to determine whether you're affected or not [4]) -- could you please clarify? I'm asking because it looks like you performed most tests on AMD machines, and it would be interesting if they also see the same CPU usage issues, or if only Intel CPUs are affected so far, see above.

[1] https://forum.proxmox.com/threads/h...sage-on-idle-with-windows-server-2025.163564/
[2] https://forum.proxmox.com/threads/h...e-with-windows-server-2025.163564/post-759010
[3] https://forum.proxmox.com/threads/h...e-with-windows-server-2025.163564/post-819396
[4] https://forum.proxmox.com/threads/h...e-with-windows-server-2025.163564/post-819262
[5] https://forum.proxmox.com/threads/h...e-with-windows-server-2025.163564/post-757369
[6] https://forum.proxmox.com/threads/win-server-2k25-qemu-disk-ultra-slow.176153/page-2#post-818302

SchorschSK · 2025-11-27T20:27:25+0100

Hi @fweber.

Maybe you are missing one point and that it is 3% - 4% per VCPU if amount of VCPU<=Real CPU Cores. For example my case (as seen in the screenshot in my previous post, is a VM with 12 VCPUs on a dual socket 26 core cpu and I see 12 qemu threads eating 4% each but together the system load is

(12vcpu * 4%) / (2 sockets * 26 cores * 100%) = 0.009 ≃ 1% (I hope I did the math right... I neglected hyperthreading as I do not know how that scales but the result with hyperthreading is then approx 0.5% as I will post in the screenshots)

So in my case the VM is putting like 1% load on the host but each Qemu CPU thread is putting 4% on the core it is running on and it is proportional so adding VCPUs to the VM is adding more physical cores running 4% load while windows is idle.

I think you should try it on a host whre no other VMs are running and the performance profile of the host is set to power saving.

Hardware: 2x Xeon 6230R

agent: 1
bios: ovmf
boot: order=scsi0;ide0;net0
cores: 12
cpu: x86-64-v3
efidisk0: local-zfs:vm-100-disk-0,efitype=4m,ms-cert=2023,pre-enrolled-keys=1,size=1M
hotplug: 0
machine: pc-q35-10.1
memory: 32768
meta: creation-qemu=10.1.2,ctime=1763848800
name: win11-2024h2
net0: virtio=BC:24:11:89:F8:B2,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
parent: clean_install
scsi0: local-zfs:vm-100-disk-1,discard=on,iothread=1,size=150G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=aa6b24d2-16cb-4d3d-abda-cd684d574207
sockets: 1
tablet: 0
tpmstate0: local-zfs:vm-100-disk-2,size=4M,version=v2.0
vmgenid: 3997f187-43da-4f78-bdcf-f6cca6d209d5

And on the host:

RoCE-geek · 2025-11-27T23:27:33+0100

fweber said:
@RoCE-geek, if I understand correctly, your investigation focused on the number and cause of the VM exits and the influence of the Hyper-V enlightenments on these VM exits -- and thank you for that, I see that Fiona already opened a feature request for reevaluating the Hyper-V enlightenments. However, I'm not sure whether your hosts are also affected by the increased CPU usage issue summarized above (you initially mentioned it's hard to determine whether you're affected or not [4]) -- could you please clarify? I'm asking because it looks like you performed most tests on AMD machines, and it would be interesting if they also see the same CPU usage issues, or if only Intel CPUs are affected so far, see above.

Hi @fweber, thanks a lot for your interest here!

I completely understand that the whole situation about such (more or less similar) symptoms is misleading. Even for me, and for many others.

Disclaimer: all our systems are tuned to maximum BIOS/HOST/GUEST performance, so no dynamic performance/power control is active.

So what are my points here:

- regardless of HW architecture, since Win11-24H2 and WS2025, there's (at least) doubled virtualization overhead (in terms of much more often VM-EXITS)
- this is highly critical, because all about virtualization is about efficiency, maximized density, and optimum performance (and corresponding ROI, TCO, etc.)
- I'm also limited to "increased idle load only", so using generic CPUs (like x86-64-v3, or EPYC-Milan-v2, etc.) is a proven prevention of VBS/HVCI problems
- this increased VM-EXITS problem is not just about "decreased performance", "decreased VM density", but in also (almost) blocks C3 states completely, which in effect blocks Turbo Boost activation. To be clear, as of now, I'm not focused on C3/TB, just on the basic increase of host idle load.
- my hit-or-miss research was focused on the "root cause", which AFAIK are the specific MSR/MSR_WRITE events, especially HVE synthetic timers
- up to Win11-23H2/WS2022, there was no such abuse, as others demonstrated just acceptable idle VM-EXITS rate (acceptable for Win VMs)
- I proved that this phenomenon of increased idle load is the same for Intel and AMD, so it's a general Windows behavior, platform independent
- The key difference is probably the final influence on each platform, i.e. how are this increased VM-EXITS transposed into the "bad behavior"

I can start with simple presentation. According to my previous report, this is how 8 vCPU WS2025 VM behaves on 8C/16T Xeon:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       1673    66.63%    27.71%      0.76us  79588.30us    402.91us ( +-  23.29% )
                 HLT        778    30.98%    70.69%     35.60us  73612.25us   2210.59us ( +-   6.97% )
  EXTERNAL_INTERRUPT         42     1.67%     0.02%      2.17us     24.89us     10.26us ( +-   8.83% )
       EPT_MISCONFIG         13     0.52%     1.58%      4.44us  29025.62us   2957.39us ( +-  77.36% )
   PAUSE_INSTRUCTION          5     0.20%     0.00%      0.90us      5.79us      2.22us ( +-  41.42% )

Total Samples:2511, Total events handled time:2432802.46us.

Now please focus ALMOST on the same, but now I just increased this VM to 16 vCPUs, so there's 1:1 vCPU to "HT CPU" binding:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       5287    66.46%    10.79%      0.72us  57574.01us    136.29us ( +-  18.70% )
                 HLT       2499    31.41%    88.92%     80.25us  60730.51us   2376.21us ( +-   3.20% )
  EXTERNAL_INTERRUPT        124     1.56%     0.02%      1.18us    123.37us     11.78us ( +-   8.53% )
       EPT_VIOLATION         43     0.54%     0.15%      0.63us   9819.67us    230.65us ( +-  98.98% )
      IO_INSTRUCTION          2     0.03%     0.12%     26.28us   8268.61us   4147.44us ( +-  99.37% )

Total Samples:7955, Total events handled time:6678366.11us.

As you can see, high increase happened, as it's really extreme overhead/EXIT-rate - all from the only one idle Windows VM (with 1:1 logical CPU binding).

And the result?

So we have almost 4% host idle load, caused just by one dumb/dummy idle Windows VM, without any "over-provisioning" - although while on "idle", there should be almost "no increase".

And this is what happens when we will clone this VM, so we will have 2x 16vCPUs on 8C/32T Xeon:

Code:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE      12629    68.53%    15.99%      0.65us  35876.31us    310.33us ( +-   5.85% )
                 HLT       5377    29.18%    83.84%      0.77us  36201.30us   3821.93us ( +-   1.61% )
  EXTERNAL_INTERRUPT        266     1.44%     0.03%      1.03us   3789.18us     29.92us ( +-  53.97% )
       EPT_VIOLATION        122     0.66%     0.04%      0.73us   7968.85us     85.81us ( +-  78.95% )
       EPT_MISCONFIG         17     0.09%     0.01%     14.64us   1778.29us    127.11us ( +-  81.22% )
      IO_INSTRUCTION         13     0.07%     0.06%     10.11us   9798.40us   1073.66us ( +-  73.39% )
   PAUSE_INSTRUCTION          3     0.02%     0.03%      0.91us   7021.83us   2342.01us ( +-  99.91% )
              VMCALL          1     0.01%     0.00%      9.45us      9.45us      9.45us ( +-   0.00% )

Total Samples:18428, Total events handled time:24511276.02us.

Host CPU load values: from ~1.9% (1x 8vCPU), to ~3.6% (1x 16 vCPU), to ~5.1% (2x 16 vCPU). All for "idle" Windows VMs.
But look - this is definitely not minor, as there's no linear scale up to 100%, because the only first 50% rule for the physical cores load (and latency, etc.).
So even a few idle Win11-24H2/WS2025 VMs can create significant load, limiting total VM density and available free CPU power for real APP workload.

What probably matters (in terms of final increased idle load and other effects):
- Platform power management (BIOS)
- Host power management (OS)
- CPU base clock and max clock
- vCPU to CPU core ratio
- CPU generation

And back to your main question: At the beginning, I wasn't sure if these increased Win-VM-EXITS are common for both AMD and Intel, so if there's something specifically bound to the platform itself. But the increased synthetic timers buzz is omnipresent, as it's simply crippled in the new Windows kernel itself. I still cannot imagine the initial bright idea for such virtulization-cripling "optimization".

And I've been also limited to the "visible" idle CPU load increase only, but as I've no completely free AMD hosts, my experiments were still not clear, i.e. not easily quantifiable. Another reason is much higher total raw performance (than some non-production Intel hosts), what is for testing a quite limited factor (in terms of reproducibility).

As a preliminary (and very limited) extrapolation, I see something like 20-50% VM-density potential decrease, if this bad behavior will not be eliminated.

And as always, tests are just tests. Until we'll have more production WS2025 VMs, the final impact will not be known. But this is a "loop" problem, as after my digging here, I'm now to "pause" all new WS2025 deployments, as even (as low as) 20% decrease of VM density will have huge production impact.

Host CPU: Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz (1 Socket)
PVE: 8.4.0 (running kernel: 6.8.12-17-pve)
VM config:

Code:

agent: 1
bios: ovmf
cores: 16
cpu: x86-64-v3
machine: pc-q35-8.2
memory: 16384
meta: creation-qemu=9.2.0
numa: 0
ostype: win11
scsi0: XXX:vm-XXX-disk-1,iothread=1,size=60G,ssd=1
scsihw: virtio-scsi-single
sockets: 1

Search

Search

High VM-EXIT and Host CPU usage on idle with Windows Server 2025

SchorschSK

New Member

RoCE-geek

Member

benyamin

Member

RoCE-geek

Member

benyamin

Member

RoCE-geek

Member

RoCE-geek

Member

RoCE-geek

Member

RoCE-geek

Member

nodoame

New Member

fiona

Proxmox Staff Member

RoCE-geek

Member

fiona

Proxmox Staff Member

fweber

Proxmox Staff Member

SchorschSK

New Member

RoCE-geek

Member

We value your privacy