Proxmox VE 1.5 + Win2k3 VMs = Strange behaviour -> massive slowdowns in machine

holgerb

Member
Aug 3, 2009
45
0
6
Hi all,

we are currently experience a very strange behaviour with the combination of Win2k3 (Standard Edition) and Proxmox VE 1.5

Our virtualisation cluster currently mainly consists out of three IBM machines (8 cores / 32 GB RAM each / 2x500GB SAS drives locally) connected to a Thecus 8800 Pro NAS.

Scenario: We need to run a Windows 2k3 VM with Oracle 11g Release 1.
Problem: Massive "performance drops" and slow behaviour although VM taskmanager and Proxmox show close to zero CPU load.

After a move away from XenServer to Proxmox we converted two Win2k3 servers (one running Oracle 10g / one running Oracle 11gR1) to KVM machines. They were running without any issues for month. After a while we were seeing strange behaviour with the Win2k3 VM running Oracle 11gR1. RDP sessions to the machine broke down, exporting a DB to a local drive or simply copying a file from one local drive to another took extreme long. All this without high CPU load in the host or visual problems within the system event log.

The slowdowns were so massive that I decided to set up a new, fresh VM with Win2k3. Already after installation when applying the latest Service Pack plus patches the "fresh" VM seemed to be very slow. Installing the SP took several hours. After tinkering around I found the DMA mode was disabled for both virtual HDDs. I tried to enable DMA but was not able to. Then I decided to give the new virtio drivers for network and storage a chance. I installed the virtio drivers, changed the harddisk type to virtio, rebooted and voila: Virtio for networking and storage.

The bad thing: Same perfomance issues as before. Slow RDP access, copying of local files took several minutes even for small files, launching local programs take very long, etc.

Since I was not shure if the Win2k3 ISO had any problem (taylored configuration, corrupted image) I tried another fresh install of Win2k3 server with another install media. Now DMA mode worked right out of the box. In the first place performance looked ok, after right when my colleague tried to install Oracle to the machine we were seeing identical behaviour:
- RDP access nearly freezing
- Copying a 1.8 GB zip from "drive C" to "drive E" took something like 30 min although both drives are located on the Thecus (4x2 TB HDD in RAID5) connected via a dedicated 1 GB storage network

The VMs itself were configured pretty identical:
2 cores
4 GB RAM
1x HDD 20 GB for OS
1x HDD 80 GB for Oracle

Is SMP causing problems with Win2k3 ?

The strangest thing in this combination:
The other migrated Win2k3 VM (with Oracle 10g) is still running fine without any issues.

Our cluster also hosts a big variety of KVM machines (SLES, Debian, OpenSuse, WinXP) and none of them shows such issues.

Has anyone of you experienced such problems ?

TIA,
Holger
 
Thnx for replying !

Ok, let me be more specifc:
1) The "other" Xen-converted VM runs fine with KVM/promox with two cores.
2) After I changed the number of cores to 1 for the new machine and rebooted the behaviour unfortunately didn´t change. It was just a quickshot though because I do not know if a VM with Win2k3 installed with just one core works different.

A quick forum search didn´t reveal a real solution or posting of a similar problem.
 
Yes, they run on the same kernel on the same host.
The new VM I set up yesterday is running on another host though because I wanted to make shure that the host is not the problem.

Kernel host #1:
Linux proxmox-epr004 2.6.32-1-pve #1 SMP Fri Jan 15 11:37:39 CET 2010 x86_64 GNU/Linux

Kernel host #2:
Linux proxmox-epr005 2.6.24-7-pve #1 SMP PREEMPT Tue Jun 2 08:00:29 CEST 2009 x86_64 GNU/Linux

Behaviour / problem looks identical on both machines. I rather think that there might be an issue with Win2k3 and KVM ?
 
We *just* removed a Win2k3 VM from PVE and noticed all of our other VM's (7 CentOS 5.5 VM's) are performing MUCH better overall. The difference is night and day.
 
I've been having problems with my windows 2003, I think since I've upgraded to 1.5. I've upgraded my kernel to the latest 2.6.24 and still no luck. Though when I re-install windows to an disk using RAW instead of qcow, my problems seemed to have gone away. I'm guessing there is a performance problem with the newer qemu/kvm with qcow
 
More or less the same here,

win2k3 with 1 socket, 2 cores
"raw" (on file, not lvm) primary ide disk.

Oracle on board.

The symptoms are similar: cpu almost free, but unbearable slowness.
The problem seems related to disk i/o, testing it with HDSpeed[1]
shows maximum 2-5 Mbytes/sec, much slower sometimes.

The same vm was running ok till a few days ago. I'm not aware of any change on pve,
running as follows:

# pveversion -v
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.24-10-pve: 2.6.24-21
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-10
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-3

rob


[1] http://www.steelbytes.com/?mid=20