Windows KVM IO freezing

Nov 22, 2023
5
0
1
hello
We are using proxmox 7 and windows 2019 virtual machines. From time to time we are facing strange issue that one of the machines freezing for some time like 10 to 30 seconds. Nothing is responsible and after this time everything is going back to normal. Only what we see is at windows resource monitor disk queue is falling to the 0 for mentioned 10 30 seconds, no any error or anything we can find. The mentioned machine is not so loaded - just average, the other ones running at this host they have no similar issue. Anyone?
 
Hi galileon,

Welcome to the forums!

one of the machines freezing for some time like 10 to 30 seconds
Is it always the same machine, or if not, always the same set of machines? If so, is there a (shared) piece of software/service that has different characteristics from the other machines?
 
hi its not - i mean we migrated to other and it happened again. All machines we do are almost the same - it happen to this particular one. Meybe its using more resources than others but it can work for one month wirhout any issue and then suddenly freezing - like today it happened after one month - then we restared proxmox and this windows kvm and it happened again then we made another restart and its gone .. thats totally weird and we dont have any idea how to diagnose - the only ine thing is this queue falling to null. Cheers
 
i mean we migrated to other and it happened again. All machines we do are almost the same - it happen to this particular one.
With machine, do you mean only a single VM (what I understood) or a single PVE installation out of many?

windows resource monitor disk queue is falling to the 0
What do you mean with that? Something like, "According to Windows, there is nothing to write to disk"?
 
With machine, do you mean only a single VM (what I understood) or a single PVE installation out of many?

Hi its single VM
What do you mean with that? Something like, "According to Windows, there is nothing to write to disk"?
At VM when its freezing we can see the disk queue is equal to 0 during this freeze time - the mouse pointer and for example chart of resource monitor are animating but U cannot browse folder or save or load any file from disk - seems like disk IO is freezed - rdp is working
Thanx
 
I have no idea :-(

If it were my environment, I'd like to know the cause, but without any progress and seeing the other VMs don't have this problem, I'd rebuild a new instance of the machine.

Maybe another visitor has a suggestion!
 
Hi, sounds like it could be the issue discussed in [1], I'm still looking into that issue.

Are you running kernel 6.2 and does your machine have multiple NUMA nodes? Could you post the output of pveversion -v and lscpu?

If yes, could you try whether disabling NUMA balancing on the host makes a difference?
Code:
echo 0 > /proc/sys/kernel/numa_balancing

[1] https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-7#post-601617
 
Hi @fweber ,

I will write on behalf of @galileon

We are not on kernel 6.x
We are using proxmox 7 - its up to date.

Yesterday we had this issues with IO where on VM it was looking like harddrive reads stopped for 10-15 seconds, after that everything working fine. We had this issue 1 month ago.
For 1 month everything was working fine on this server - no any change, no any update etc just at some point yesterday issue occured.

Yesterday we restarted pve server - no change, after that we shutted down vm and changed aio to native and no cache and today is ok

Mitigations are off in kernel, some sysctl changes, no numa changes.

Server is HP dl380 g9, with p810 raid controller, with 4gb cache, 512ram, 2x cpu, 8 x ssd in raid 10
VM is windows 2019 but like i mentioned it was working for 1 month without any issues.

Code:
root@pve9:~# cat /etc/sysctl.conf | grep -v ^# | grep -v ^$
vm.swappiness=1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 500



root@pve9:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.131-1-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-8
pve-kernel-5.15.131-1-pve: 5.15.131-2
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.4-1
proxmox-backup-file-restore: 2.4.4-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
root@pve9:~# lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             88
On-line CPU(s) list:                0-87
Thread(s) per core:                 2
Core(s) per socket:                 22
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              79
Model name:                         Intel(R) Xeon(R) CPU E5-2696 v4 @ 2.20GHz
Stepping:                           1
CPU MHz:                            2780.111
CPU max MHz:                        3700.0000
CPU min MHz:                        1200.0000
BogoMIPS:                           4395.15
Virtualization:                     VT-x
L1d cache:                          1.4 MiB
L1i cache:                          1.4 MiB
L2 cache:                           11 MiB
L3 cache:                           110 MiB
NUMA node0 CPU(s):                  0-21,44-65
NUMA node1 CPU(s):                  22-43,66-87
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        KVM: Vulnerable
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX vulnerable
Vulnerability Mds:                  Vulnerable; SMT vulnerable
Vulnerability Meltdown:             Vulnerable
Vulnerability Mmio stale data:      Vulnerable
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:           Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
                                    xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f
                                    16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep
                                    bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
 
Okay, thanks, if you're running kernel 5.15 it is probably a different issue than in the linked thread [1] -- there, the freezes did not happen with kernel 5.15.

Can you share the VM config and check whether KSM is active, i.e. the output of the following commands:
Code:
qm config VMID --current
systemctl status ksmtuned
grep '' /sys/kernel/mm/ksm/pages_*

[1] https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-7#post-601617
 
KSM is active

Code:
root@pve9:~# qm config 114 --current
agent: 1
boot: order=ide2;virtio0
cores: 8
ide2: none,media=cdrom
machine: pc-i440fx-7.2
memory: 32768
name: mc-vlan910-192.168.10.2
net0: virtio=46:21:37:D8:49:95,bridge=vmbr0,firewall=1,tag=910
numa: 1
onboot: 1
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=e8e1c387-6b12-40cb-8b4b-708f0d20a8d0
sockets: 4
startup: up=10
tags: customer
virtio0: local-lvm:vm-114-disk-0,aio=native,discard=on,format=raw,iothread=1,size=500G
vmgenid: a131daa5-687f-4a48-b085-9007fa22e104
root@pve9:~# systemctl status ksmtuned
● ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon
     Loaded: loaded (/lib/systemd/system/ksmtuned.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-11-22 15:17:48 CET; 1 day 1h ago
   Main PID: 1519 (ksmtuned)
      Tasks: 2 (limit: 618910)
     Memory: 1.4M
        CPU: 41.334s
     CGroup: /system.slice/ksmtuned.service
             ├─   1519 /bin/bash /usr/sbin/ksmtuned
             └─3545940 sleep 60

Nov 22 15:17:48 pve9 systemd[1]: Starting Kernel Samepage Merging (KSM) Tuning Daemon...
Nov 22 15:17:48 pve9 systemd[1]: Started Kernel Samepage Merging (KSM) Tuning Daemon.
root@pve9:~# grep '' /sys/kernel/mm/ksm/pages_*
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
root@pve9:~#
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!