The organization I work for is currently comparing VM overhead between Proxmox 7 and Hyper-V after calls that the Proxmox hosted terminal server had slowdowns.
While I did not manage to catch the exact cause of the slowdowns in time in my analysis I did notice the observation that the reported CPU usage in task manager is higher on the Proxmox VM, compared to a Hyper-V hosted server with a similar amount of users and workload on Windows Server 2016/2019 based servers.
So I used Kernrate to analyse what part of the system was being used, and I notice a big difference in HAL utilization between Proxmox 7 hosted guests and the Hyper-V hosted guest with the HAL usage being consistent across various Proxmox 7 hosted guests.
From the Proxmox 7 guest the log is as follows with very high usage being dedicated towards the Hardware Abstraction Layer:
By comparison here is the same analysis of a Hyper-V powered guest :
As you can see the HAL utilization is double on the Proxmox powered terminal server compared to our Hyper-V powered terminal server.
In practise we observe a 50-70% CPU load even if most user processes are not using more than 30% CPU combined.
The question is, why is this? And is this expected behavior for KVM or are we loosing performance to the hypervisor because of a configuration issue?
Additionally, could this be related to the 5.13 / 5.15 kernel and its mitigations and are there any tweaks we can do to lower the overhead these logs suggest is present?
Would love to hear your feedback or Windows Server VM performance suggestions.
If you wish to test this in your own environment the tool used to analyse this is called Kernrate and it can be found for free inside Windows Driver Kits.
While I did not manage to catch the exact cause of the slowdowns in time in my analysis I did notice the observation that the reported CPU usage in task manager is higher on the Proxmox VM, compared to a Hyper-V hosted server with a similar amount of users and workload on Windows Server 2016/2019 based servers.
So I used Kernrate to analyse what part of the system was being used, and I notice a big difference in HAL utilization between Proxmox 7 hosted guests and the Hyper-V hosted guest with the HAL usage being consistent across various Proxmox 7 hosted guests.
From the Proxmox 7 guest the log is as follows with very high usage being dedicated towards the Hardware Abstraction Layer:
Code:
ProfileTime 335560 hits, 10000 events per hit --------
Module Hits msec %Total Events/Sec
HAL 231256 32366 68 % 71450287
NTOSKRNL 82011 32366 24 % 25338626
WIN32KFULL 10551 32366 3 % 3259902
WIN32KBASE 4223 32366 1 % 1304764
NTFS 1649 32366 0 % 509485
FLTMGR 1595 32366 0 % 492801
TCPIP 550 32366 0 % 169931
NETKVM 499 32366 0 % 154174
DXGKRNL 456 32366 0 % 140888
DXGMMS2 396 32366 0 % 122350
WDFILTER 359 32366 0 % 110918
NDIS 264 32366 0 % 81567
NETIO 262 32366 0 % 80949
NPFS 210 32366 0 % 64882
AFD 193 32366 0 % 59630
TSFAIRSHARE 179 32367 0 % 55303
VIOSCSI 112 32366 0 % 34604
LUAFV 87 32366 0 % 26880
WIN32K 82 32366 0 % 25335
WCIFS 61 32366 0 % 18846
PACER 60 32366 0 % 18537
CDD 54 32367 0 % 16683
STORPORT 49 32366 0 % 15139
RDPUDD 39 32366 0 % 12049
REGISTRY 35 32366 0 % 10813
BASICRENDER 34 32366 0 % 10504
WPPRECORDER 33 32366 0 % 10195
CNG 26 32366 0 % 8033
CLASSPNP 24 32366 0 % 7415
ATAPORT 22 32366 0 % 6797
WATCHDOG 17 32366 0 % 5252
MMCSS 16 32366 0 % 4943
WOF 16 32366 0 % 4943
NSIPROXY 16 32366 0 % 4943
TERMINPT 16 32366 0 % 4943
RDBSS 12 32366 0 % 3707
PARTMGR 12 32366 0 % 3707
MOUCLASS 11 32366 0 % 3398
TM 11 32365 0 % 3398
VOLSNAP 6 32366 0 % 1853
MRXSMB 5 32367 0 % 1544
AHCACHE 5 32366 0 % 1544
RDPDR 4 32366 0 % 1235
WDF01000 4 32366 0 % 1235
VOLMGR 4 32366 0 % 1235
MUP 4 32366 0 % 1235
VIOSER 4 32366 0 % 1235
BALLOON 3 32366 0 % 926
KSECDD 3 32366 0 % 926
MSFS 3 32366 0 % 926
RASSSTP 2 32366 0 % 617
MOUNTMGR 2 32366 0 % 617
VOLUME 2 32366 0 % 617
DISK 2 32366 0 % 617
KBDCLASS 2 32366 0 % 617
MRXSMB20 2 32366 0 % 617
CONDRV 1 32367 0 % 308
PDC 1 32366 0 % 308
BASICDISPLAY 1 32366 0 % 308
NPSVCTRIG 1 32366 0 % 308
USBUHCI 1 32366 0 % 308
By comparison here is the same analysis of a Hyper-V powered guest :
Code:
ProfileTime 248939 hits, 10000 events per hit --------
Module Hits msec %Total Events/Sec
NTOSKRNL 150670 25244 60 % 59685469
HAL 84475 25244 33 % 33463397
WIN32KFULL 4118 25243 1 % 1631343
WIN32KBASE 2889 25243 1 % 1144475
NTFS 1186 25244 0 % 469814
WRKRN 1010 25244 0 % 400095
FLTMGR 771 25244 0 % 305419
DXGMMS2 517 25244 0 % 204801
DXGKRNL 450 25243 0 % 178267
TSFAIRSHARE 439 25244 0 % 173902
TCPIP 426 25244 0 % 168752
WDFILTER 387 25244 0 % 153303
NPFS 229 25244 0 % 90714
NETIO 226 25243 0 % 89529
NDIS 143 25243 0 % 56649
NETVSC 126 25244 0 % 49912
VMBKMCL 110 25243 0 % 43576
AFD 88 25244 0 % 34859
PACER 67 25244 0 % 26540
CDD 65 25244 0 % 25748
STORPORT 63 25244 0 % 24956
LUAFV 60 25244 0 % 23768
VMBUS 51 25243 0 % 20203
WIN32K 48 25243 0 % 19015
BASICRENDER 47 25244 0 % 18618
RDPUDD 33 25244 0 % 13072
VOLSNAP 30 25244 0 % 11884
WATCHDOG 23 25243 0 % 9111
CNG 22 25244 0 % 8714
PARTMGR 20 25243 0 % 7922
CLASSPNP 19 25244 0 % 7526
WOF 16 25244 0 % 6338
WPPRECORDER 16 25244 0 % 6338
TERMINPT 10 25244 0 % 3961
TM 10 25244 0 % 3961
MOUCLASS 8 25243 0 % 3169
WINHV 8 25244 0 % 3169
STORVSC 7 25244 0 % 2772
WDNISDRV 5 25244 0 % 1980
CI 5 25244 0 % 1980
VOLMGR 5 25243 0 % 1980
RDBSS 5 25244 0 % 1980
NSIPROXY 5 25244 0 % 1980
MMCSS 5 25244 0 % 1980
VOLUME 4 25244 0 % 1584
MRXSMB 3 25244 0 % 1188
AHCACHE 3 25244 0 % 1188
BAM 2 25244 0 % 792
MUP 2 25244 0 % 792
DISK 2 25244 0 % 792
CTXUSBM 1 25244 0 % 396
MSRPC 1 25244 0 % 396
CLFS 1 25244 0 % 396
CLIPSP 1 25244 0 % 396
PCW 1 25243 0 % 396
PDC 1 25243 0 % 396
MOUNTMGR 1 25244 0 % 396
DFSC 1 25244 0 % 396
RDPDR 1 25244 0 % 396
MRXSMB20 1 25244 0 % 396
As you can see the HAL utilization is double on the Proxmox powered terminal server compared to our Hyper-V powered terminal server.
In practise we observe a 50-70% CPU load even if most user processes are not using more than 30% CPU combined.
The question is, why is this? And is this expected behavior for KVM or are we loosing performance to the hypervisor because of a configuration issue?
Additionally, could this be related to the 5.13 / 5.15 kernel and its mitigations and are there any tweaks we can do to lower the overhead these logs suggest is present?
Would love to hear your feedback or Windows Server VM performance suggestions.
If you wish to test this in your own environment the tool used to analyse this is called Kernrate and it can be found for free inside Windows Driver Kits.