Spikes in latency on Windows VM

elagil

Member
Jan 4, 2020
19
1
8
32
Hello!

I am using Proxmox 6.1-5 with a Windows 10 VM with passed through GPU. Here is the VM configuration:

Code:
agent: 1
balloon: 0
bios: ovmf
boot: c
bootdisk: scsi0
cores: 14
cpu: host
efidisk0: local-lvm:vm-101-disk-1,size=128K
hostpci0: 0a:00,x-vga=1,pcie=1,romfile=GTX1070_spliced.bin
hostpci1: 0b:00.3,pcie=1
hotplug: disk,network,usb,memory,cpu
machine: q35
memory: 20480
name: Windows10a
net0: virtio=F6:FB:DE:3A:E5:8B,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-lvm:vm-101-disk-0,backup=0,cache=writeback,iothread=1,size=128G,ssd=1
scsi2: /dev/disk/by-id/ata-CT2000MX500SSD1_1825E144C0C8,size=1953514584K
scsihw: virtio-scsi-single
smbios1: uuid=272870d3-f14e-4ae2-b9b9-a47e1b6ecd4b
sockets: 1
vga: none
vmgenid: b5558763-7896-4b51-bff3-ad1cc7627879

It is using 14 of the 16 threads that a Ryzen 7 2700 provides and also 20 of 32 GB of total RAM.

Anyway, when used for gaming, I get noticable spikes every 30 or so seconds. Also, when using LatencyMon, I have considerable ISR and DPC execution times (in the order of 3-10 ms max.).

latency.JPG

Proxmox, as well as my VM are on the same SSD (Crucial MX300 525GB, 3% wear). I have tried switching all my drivers to MSI (graphics card especially) but with no positive effect.

Is there a way of mitigating the latency spikes?

Thanks in advance!
 
Please post cat /etc/modprobe.d/vfio.conf and lspci -v.

I have tried switching all my drivers to MSI (graphics card especially) but with no positive effect.
Did you use the Windows defaults before?
 
Hi. same problem here.
I am training to use this vm as an audio workstation. And the latency generates audio dropouts.

If i change from q35 to i440fx the latency goes away. But no GPU passthrough.

I am also passing all the usb ports as a PCI device.








cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f08,10de:10f9,10de:1ada,10de:1adb disable_vga=1




lspci -v
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 0d)
Subsystem: ASUSTeK Computer Inc. 8th Gen Core Processor Host Bridge/DRAM Registers
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>
Kernel driver in use: skl_uncore
Kernel modules: ie31200_edac

00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 0d) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: 64000000-650fffff
Prefetchable memory behind bridge: 0000000050000000-00000000620fffff
Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [a0] Express Root Port (Slot+), MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [140] Root Complex Link
Capabilities: [d94] #19
Kernel driver in use: pcieport

00:02.0 VGA compatible controller: Intel Corporation Device 3e98 (rev 02) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 8694
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at 63000000 (64-bit, non-prefetchable) [size=16M]
Memory at 40000000 (64-bit, prefetchable) [size=256M]
I/O ports at 4000
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel modules: i915

00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10) (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH USB 3.1 xHCI Host Controller
Flags: bus master, medium devsel, latency 0, IRQ 128
Memory at 4000100000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
Capabilities: [90] Vendor Specific Information: Len=14 <?>
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH Shared SRAM
Flags: fast devsel
Memory at 4000114000 (64-bit, non-prefetchable) [disabled] [size=8K]
Memory at 4000118000 (64-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [80] Power Management version 3

00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH HECI Controller
Flags: bus master, fast devsel, latency 0, IRQ 131
Memory at 4000117000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [50] Power Management version 3
Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [a4] Vendor Specific Information: Len=14 <?>
Kernel driver in use: mei_me
Kernel modules: mei_me

00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10) (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH SATA AHCI Controller
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 127
Memory at 65120000 (32-bit, non-prefetchable) [size=8K]
Memory at 65123000 (32-bit, non-prefetchable)
I/O ports at 4090
I/O ports at 4080
I/O ports at 4060
Memory at 65122000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Kernel driver in use: ahci
Kernel modules: ahci

00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 123
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 00005000-00005fff
Memory behind bridge: 62100000-622fffff
Prefetchable memory behind bridge: 0000004000200000-00000040003fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH PCI Express Root Port
Capabilities: [a0] Power Management version 3
Kernel driver in use: pcieport

00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 124
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
Capabilities: [40] Express Root Port (Slot-), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH PCI Express Root Port
Capabilities: [a0] Power Management version 3
Kernel driver in use: pcieport

00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 125
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 00006000-00006fff
Memory behind bridge: 62300000-624fffff
Prefetchable memory behind bridge: 0000004000400000-00000040005fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH PCI Express Root Port
Capabilities: [a0] Power Management version 3
Kernel driver in use: pcieport

00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
Subsystem: ASUSTeK Computer Inc. Z390 Chipset LPC/eSPI Controller
Flags: bus master, medium devsel, latency 0

00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH cAVS
Flags: bus master, fast devsel, latency 32, IRQ 132
Memory at 4000110000 (64-bit, non-prefetchable) [size=16K]
Memory at 4000000000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [50] Power Management version 3
Capabilities: [80] Vendor Specific Information: Len=14 <?>
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_sof_pci

00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH SMBus Controller
Flags: medium devsel, IRQ 16
Memory at 4000116000 (64-bit, non-prefetchable)
I/O ports at efa0
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801

00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
Subsystem: ASUSTeK Computer Inc. Cannon Lake PCH SPI Controller
Flags: fast devsel
Memory at fe010000 (32-bit, non-prefetchable) [size=4K]

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
Subsystem: ASUSTeK Computer Inc. Ethernet Connection (7) I219-V
Flags: bus master, fast devsel, latency 0, IRQ 129
Memory at 65100000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: e1000e
Kernel modules: e1000e

01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. TU106 [GeForce RTX 2060 Rev. A]
Flags: fast devsel, IRQ 11
Memory at 64000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at 50000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at 60000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at 3000 [disabled]
Expansion ROM at 65000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Capabilities: [bb0] #15
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
Subsystem: ASUSTeK Computer Inc. TU106 High Definition Audio Controller
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at 65080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

01:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. TU106 USB 3.1 Host Controller
Flags: fast devsel, IRQ 130
Memory at 62000000 (64-bit, prefetchable) [size=256K]
Memory at 62040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller (rev a1)
Subsystem: ASUSTeK Computer Inc. TU106 USB Type-C Port Policy Controller
Flags: bus master, fast devsel, latency 0, IRQ 126
Memory at 65084000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
 
Have you already disabled c-states and dynamic clocking of your processor?
I don't recall the actual name of the function on and. On Intel it would be turboboost and speedstep.
These functionalities can have such side effects.

Also try reducing the cores in the VM to 6 or 8 C.
The Ryzen 7-2700 is specced as 8 cores.
It doesent make sense to give the VM more cores than your system can provide.
Hyperthreading or whatever it is in amd terms is no real core and also can create side effects.
 
  • Like
Reactions: leesteken
When CPU's provide 2 threads per core, it does not scale linearly. It does improve performance on multi-threaded workload but it tends to be around 1.2-1.3 instead of x2.
It does make the system more responsive (less latency when switching work because of more threads). Please note that virtual devices like virtio network and disks and other systems like the GUI and ZFS also want threads. If you allocate most threads to the VM, it need to wait until all those threads are available (when they might be busy with the things mentioned before) to allow the VM to process, thus increasing latency.
Virtualization works wonders when many smaller (than the host) VMs are not busy all the time, and any unused cycles can be used by other VMs. It makes sure that any idle time is filled with useful work. However, because of additional work and threads (for virtio I/O and emulation), it does not work as well when allocation much of the host resources to a particular VM. Try using 8 cores or less of a Ryzen 2700 for you main VM and let Proxmox use the additional 20% to 30% for "background tasks" like Proxmox, I/O, and any virtual servers you created as containers (less overhead). Then the latency will improve because resources are available because there are idle threads.
 
I know this is a necro thread revival, but I stumbled on it while doing some troubleshooting of my own. Ultimately I used information from here: https://forum.level1techs.com/t/win...rformance-optimization-in-msfs-2020/187683/24 to help, but there was more to the story. Here is what worked for me:

1) Enable Huge Pages
How:
a) Enable hugepages on the host
* Add default_hugepagesz=1G hugepagesz=1G hugepages=<num> to your /etc/default/grub on the line GRUB_CMDLINE_LINUX_DEFAULT
* reboot
b) Add hugepages: 1024 to your VM config
c) Enable pdpe1gb in your VM CPU flags

Result: Reduced duration of the stutters. Stutters were all under 100ms after this as opposed to several hundred ms before.

2) Enable CPU Affinity
How:
a) Get the NUMA to core mapping via lscpu
For me I have:
Code:
NUMA:                   
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-25,52-77
  NUMA node1 CPU(s):     26-51,78-103
b) THIS IS THE IMPORTANT PART You need to add CPU Affinity You can do this though the GUI (Hardware -> CPU (with Advanced checked) -> CPU Affinity) and assign the cores from your CPU that you want your VM to use. These need to be on the same NUMA node that has your GPU. If you have one GPU and one CPU this is easy, but if you have more than one of either you will need to check the PCIe lanes and see which CPU manages which. Matching a CPU to a GPU on the other CPU will make things MUCH WORSE.

Result: Almost native levels of performance and a SIGNIFICANT reduction in stutters


3) Enable CPU Flags
How: Add thehv-tlbflush and aes flags to your CPU. I use a 2nd Gen Xeon scalable and these work for me, but they may not be available (or work) for you.

Result: Maybe fewer stutters?

In the end, the only small stutters I get now are network related which is a different story and has very little to do with the VM/PVE host itself.

Let me know if this helps!
 
  • Like
Reactions: Tail870
I know this is a necro thread revival, but I stumbled on it while doing some troubleshooting of my own. Ultimately I used information from here: https://forum.level1techs.com/t/win...rformance-optimization-in-msfs-2020/187683/24 to help, but there was more to the story. Here is what worked for me:

1) Enable Huge Pages
How:
a) Enable hugepages on the host
* Add default_hugepagesz=1G hugepagesz=1G hugepages=<num> to your /etc/default/grub on the line GRUB_CMDLINE_LINUX_DEFAULT
* reboot
b) Add hugepages: 1024 to your VM config
c) Enable pdpe1gb in your VM CPU flags

Result: Reduced duration of the stutters. Stutters were all under 100ms after this as opposed to several hundred ms before.

2) Enable CPU Affinity
How:
a) Get the NUMA to core mapping via lscpu
For me I have:
Code:
NUMA:                 
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-25,52-77
  NUMA node1 CPU(s):     26-51,78-103
b) THIS IS THE IMPORTANT PART You need to add CPU Affinity You can do this though the GUI (Hardware -> CPU (with Advanced checked) -> CPU Affinity) and assign the cores from your CPU that you want your VM to use. These need to be on the same NUMA node that has your GPU. If you have one GPU and one CPU this is easy, but if you have more than one of either you will need to check the PCIe lanes and see which CPU manages which. Matching a CPU to a GPU on the other CPU will make things MUCH WORSE.

Result: Almost native levels of performance and a SIGNIFICANT reduction in stutters


3) Enable CPU Flags
How: Add thehv-tlbflush and aes flags to your CPU. I use a 2nd Gen Xeon scalable and these work for me, but they may not be available (or work) for you.

Result: Maybe fewer stutters?

In the end, the only small stutters I get now are network related which is a different story and has very little to do with the VM/PVE host itself.

Let me know if this helps!

Thanks a lot! Your post helped me a lot.
But I have a question:
currently I'm using this grub-string:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off default_hugepagesz=1G hugepagesz=1G hugepages=0"

In hugepages=<num> how should I calculate a num parameter? Or zeroing it is fine?
 
Thanks a lot! Your post helped me a lot.
But I have a question:
currently I'm using this grub-string:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off default_hugepagesz=1G hugepagesz=1G hugepages=0"

In hugepages=<num> how should I calculate a num parameter? Or zeroing it is fine?
Sorry for the late reply! The num here will be the total number of hugepages avail, so it should be AT MAX the total GB that you intend to use at any given time, preferably a power of 2. Hugepages are great for performance, but monopolize a lot of memory in the process.
 
  • Like
Reactions: Tail870
Sorry for the late reply! The num here will be the total number of hugepages avail, so it should be AT MAX the total GB that you intend to use at any given time, preferably a power of 2. Hugepages are great for performance, but monopolize a lot of memory in the process.

So, if I use 32 GB of RAM for a single VM I should use 64 hugepages?
Or it's preferable to set [total RAM GB] minus 1-2 GB for Proxmox itself?

UPD:
power of 2
Silly me. It means, like, 2-4-8-16-etc, right?
 
Last edited:
So, if I use 32 GB of RAM for a single VM I should use 64 hugepages?
Or it's preferable to set [total RAM GB] minus 1-2 GB for Proxmox itself?

UPD:

Silly me. It means, like, 2-4-8-16-etc, right?

I would recommend you increase you huge page size to 1024 and use 16 of them.
 
  • Like
Reactions: Tail870

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!