Windows 2003 32bits cpu usage peaks / freeze due to hardware interrupts

TomTomGo

Renowned Member
Mar 30, 2012
48
3
73
France
Hi all,

Since we migrate our host from 2.3 to 3.1, one of our Windows 2003 32bits VM freezes randomly without BSOD.
We noticed that there was some cpu usage peaks due to interrupts hardware there was not before under 2.3.
A second W2K3 32bits VM is running on this host without any problems.
What we do to incriminate VM or not is sharing the /var/lib/vz with nfs on the 3.1 host, add this share to a 2.3 host and run the VM on the 2.3 host with exactly the same config file.
On the 2.3 host, the VM is running without any problems.
More investigations using kernrate show me that ntkrnlpa generates lot of hitings :

Code:
 /==============================\
<         KERNRATE LOG           >
 \==============================/
Date: 2013/08/28   Time: 18:19:31
Machine Name: SV113
Number of Processors: 4
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 4096 MB
Pagefile Total: 5976 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2046MB
OS Version: 5.2 Build 3790 Service-Pack: 2.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe 


Kernel Profile (PID = 0): Source= Time, 
Using Kernrate Default Rate of 25000 events/hit

------------Overall Summary:--------------

P0     K 0:00:23.968 (10.6%)  U 0:00:04.890 ( 2.2%)  I 0:03:17.890 (87.3%)  DPC 0:00:00.296 ( 0.1%)  Interrupt 0:00:13.218 ( 5.8%)
       Interrupts= 325251, Interrupt Rate= 1434/sec.

P1     K 0:00:22.234 ( 9.8%)  U 0:00:07.500 ( 3.3%)  I 0:03:17.015 (86.9%)  DPC 0:00:00.156 ( 0.1%)  Interrupt 0:00:15.156 ( 6.7%)
       Interrupts= 107537, Interrupt Rate= 474/sec.

P2     K 0:00:26.671 (11.8%)  U 0:00:02.937 ( 1.3%)  I 0:03:17.140 (86.9%)  DPC 0:00:00.265 ( 0.1%)  Interrupt 0:00:20.500 ( 9.0%)
       Interrupts= 107571, Interrupt Rate= 474/sec.

P3     K 0:00:19.484 ( 8.6%)  U 0:00:07.468 ( 3.3%)  I 0:03:19.796 (88.1%)  DPC 0:00:00.203 ( 0.1%)  Interrupt 0:00:08.734 ( 3.9%)
       Interrupts= 107564, Interrupt Rate= 474/sec.

TOTAL  K 0:01:32.359 (10.2%)  U 0:00:22.796 ( 2.5%)  I 0:13:11.843 (87.3%)  DPC 0:00:00.921 ( 0.1%)  Interrupt 0:00:57.609 ( 6.4%)
       Total Interrupts= 647923, Total Interrupt Rate= 2857/sec.


Total Profile Time = 226750 msec

                                       BytesStart          BytesStop         BytesDiff.
    Available Physical Memory   ,      2970775552,      2899222528,       -71553024
    Available Pagefile(s)       ,      4994494464,      4908650496,       -85843968
    Available Virtual           ,      2132312064,      2131263488,        -1048576
    Available Extended Virtual  ,               0,               0,               0

                                  Total      Avg. Rate
    Context Switches     ,       495835,         2187/sec.
    System Calls         ,      1870490,         8249/sec.
    Page Faults          ,        91366,         403/sec.
    I/O Read Operations  ,        11135,         49/sec.
    I/O Write Operations ,        12094,         53/sec.
    I/O Other Operations ,        42708,         188/sec.
    I/O Read Bytes       ,     36180810,         3249/ I/O
    I/O Write Bytes      ,      3572093,         295/ I/O
    I/O Other Bytes      ,   4722415272,         110574/ I/O

-----------------------------

Results for Kernel Mode:
-----------------------------

OutputResults: KernelModuleCount = 114
Percentage in the following table is based on the Total Hits for the Kernel

Time   359459 hits, 25000 events per hit --------
 Module                                Hits   msec  %Total  Events/Sec
intelppm                             277297     226734    77 %    30575145
ntkrnlpa                              68881     226734    19 %     7594912
hal                                   12166     226734     3 %     1341439
win32k                                  354     226734     0 %       39032
klif                                    154     226734     0 %       16980
Ntfs                                    135     226734     0 %       14885
fltmgr                                   95     226734     0 %       10474
e1000325                                 91     226734     0 %       10033
klflt                                    66     226734     0 %        7277
tcpip                                    58     226734     0 %        6395
BALLOON                                  29     226734     0 %        3197
RDPDD                                    22     226734     0 %        2425
wdf01000                                 21     226734     0 %        2315
SCSIPORT                                 16     226734     0 %        1764
viostor                                  10     226734     0 %        1102
Dfs                                       9     226734     0 %         992
kltdi                                     7     226734     0 %         771
NDIS                                      7     226734     0 %         771
kneps                                     5     226734     0 %         551
RDPWD                                     4     226734     0 %         441
USBPORT                                   4     226734     0 %         441
usbuhci                                   4     226734     0 %         441
CLASSPNP                                  4     226734     0 %         441
atapi                                     3     226734     0 %         330
ftdisk                                    3     226734     0 %         330
Npfs                                      2     226734     0 %         220
termdd                                    2     226734     0 %         220
watchdog                                  2     226734     0 %         220
srv                                       1     226734     0 %         110
afd                                       1     226734     0 %         110
ipnat                                     1     226734     0 %         110
TDI                                       1     226734     0 %         110
cdrom                                     1     226734     0 %         110
KSecDD                                    1     226734     0 %         110
PartMgr                                   1     226734     0 %         110
volsnap                                   1     226734     0 %         110

================================= END OF RUN ==================================

Zooming on ntkrnlpa says :

Code:
...
Time   52506 hits, 25000 events per hit --------
 Module                                Hits   msec  %Total  Events/Sec
ExAllocatePoolWithTag                 41945     166540    79 %     6296535
KeTerminateThread                      4365     166540     8 %      655247
ZwYieldExecution                       2363     166540     4 %      354719
RtlCaptureContext                       720     166540     1 %      108082
KiDispatchInterrupt                     422     166540     0 %       63348
KeFlushEntireTb                         419     166540     0 %       62897
NtBuildNumber                           242     166540     0 %       36327
NtFreeVirtualMemory                     236     166540     0 %       35426
CmRegisterCallback                      178     166540     0 %       26720
ObQueryNameString                       112     166540     0 %       16812
KeAreAllApcsDisabled                     94     166540     0 %       14110
PoShutdownBugCheck                       83     166540     0 %       12459
wctomb                                   77     166540     0 %       11558
ProbeForRead                             42     166540     0 %        6304
RtlCompressBuffer                        39     166540     0 %        5854
ExRaiseHardError                         38     166540     0 %        5704
ObFindHandleForObject                    38     166540     0 %        5704
PoQueueShutdownWorkItem                  35     166540     0 %        5253
RtlInitializeGenericTable                33     166540     0 %        4953
NtAllocateUuids                          29     166540     0 %        4353
ExFreePoolWithTag                        28     166540     0 %        4203
...

VM configuration file :

Code:
acpi: 1
balloon: 2048
boot: cad
bootdisk: virtio0
cores: 2
cpu: core2duo
cpuunits: 1000
freeze: 0
ide2: none,media=cdrom
kvm: 1
memory: 4096
name: SV113
net0: e1000=AA:2B:48:6B:E8:E6,bridge=vmbr0
ostype: w2k3
sockets: 2
startup: order=1
virtio0: local:113/vm-113-disk-1.raw,format=raw

PVE version on 3.1 host :

Code:
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-4 (running version: 3.1-4/f6816604)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2

PVE version on 2.3 host :

Code:
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-96
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1

HAL in hardware manager in the VM shows "ACPI Multiprocessor computer"
I tried to change CPU type to host, downgrade the NIC from virtio to e1000, delete all non-present devices in hardware manager ... unsuccessflully ...

Any ideas ?

Regards,

Thomas
 
I really have no idea what the issue would be.
In my experience using virtio on 2003 was flaky, I just stick with IDE and e1000 when using 2003 (just a couple VMs)

Thanks for your post.
It's really strange because nothing have changed in the VM configuration, just switching between 2.3 and 3.1 made the VM to work or not ...
I'm going to try some things with a copy of this VM, like changing CPU core / sockets number, cpu type, IDE like you suggest, ...

Regards,
 
Running a copy of the VM on a spare 3.1 host doesn't reproduce the problem.
The only difference between production VM and test VM is the network card running in a vlan in order to isolate the copy vm from the production LAN / VM.
So maybe it should be a network issue, like broadcasting or something like that ?
I'm going to investigate on the VM side with etherreal to see what happens ...
Why some network activity should hang the vm on a 3.1 host and not on a 2.3 ?
Maybe something have changed in the network layer beetween Proxmox 2.3 and 3.X ?

Regards,

Thomas
 
Yes, a SQL Server Std 2008 R2 with a 2gb limit assigned + a SQL Express for WSUS ...

can you try
http://pve.proxmox.com/wiki/Performance_Tweaks

(this change the timer precision in sqlserver, which send a lot of interrupts by default)

[h=3]Trace Flag T8038[/h]Setting the trace flag -T8038 will drastically reduce the number of context switches when running SQL 2005 or 2008.
To change the trace flag:

  1. Open the SQL server Configuration Manager
  2. Open the properties for the SQL service typically named MSSQLSERVER
  3. Go to the advanced tab
  4. Append ;-T8038 to the end of the startup parameters option
 
Interrupts from network driver
Try to disable network adapter and turn it on back. Interrupts value suddenly goes down before next reboot

and i have BSOD in netkvm.sys when i try to disable virtio network with running Sysinternals Process Explorer 15.13

i have the same issue with windows 2003 32 bit guest
MS SQL doesn't installed.
Code:
pveversion --verbose
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-18-pve: 2.6.32-88
pve-kernel-2.6.32-19-pve: 2.6.32-95
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
and driver version
Code:
Red Hat VirtIO Ethernet Adapter
Driver Date: 17.04.2013
Driver Version: 51.64.104.5900
 
Last edited:
@spirit and @naves

Thank you guys for your suggestions.
I'll make some tests tonight when the server will be idle, hope we're on the right way ... ;)
 
Some fresh news :

I removed the network card, delete it from "hidden's hardware" in the windows hardware manager and readd it with drivers from latest virtio-0.1-65.
Problem still occurs, grrrrr
I've notice that some STP requests coming from the switch (Cisco Catalyst 2950) occurs simultaneously with hw interrupts peaks, it should be the origin of the problem.

Screenshot : wireshark.png

How can we said to the host bnx2 driver or guest vm virtio net driver to ignore these packets ?
And what changed beetween 2.3 and 3.1, maybe bnx2 driver / config ?

Still investigating ...

Regards,
Thoams
 
Last edited:
Seems to be ok since i've made a second reboot of the VM after deleting / re-adding the network card and upgrading the virtio driver.
So the process should be :

1 - Shutdown the VM and remove the network card from the VM configuration (or change the card type to e1000)
2 - Sart the vm and remove the virtio nic from the hardware manager (launch in a cmd "set devmgr_show_nonpresent_devices=1 & devmgmt.msc" and tick "show hidden peripherals" in Display menu)
3 - Shut down the vm and re-add the virtio nic (or change the card type e1000 back to virtio )
4 - Start the vm and reinstall vitio nic with the latest drivers (0.1-65 at this time)
5 - Reboot the VM TWICE

@naves : can you confirm this is what you dou on your side ?

Regards,

Thomas
 
Yes, something like this.
But interrupts value will can suddenly grow again
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!