VM high vCPU usage issues

avladulescu · Nov 2, 2015

Hi guys,

This is my first post so go easy on me please.
Great work around here, but grab yourself a cup of coffee because this is not going to be a short post...

I'm heavily using Prox (prod&dev) for quite a few years after dumping the complicated and beamy Ovirt project and I've been scramming around this forum all the way reading and learning & comparing results with trial and error testing on my own, but I got currently two issue I cannot find its way around nor make a reasonable explanation to it.

Therefore this post is quite big because I didn't want to provide chunked info of the env/setup and synoptic I'm getting and tried to give as much as possible info in a professional way.

If the subject makes you consider this post is a similar to "KVM 100% cpu usage after online migration" thread you're wrong (check posts date), because we're talking nowadays versions of prox/kvm/kernel.

So no more fussing around, here are the premises:

- I run like Prox on 5 different (large) setups deployments (in different places and with different scopes) - more than 2/3 Hypervisors and 1+ NICs dedicated network storages separated from and upstream VM data traffic , as well as some of the setup with local disks.
- one of the cluster is a HA 2 node setup with quorum disk
- server involved in these setups are ranging from i7-4930K (32/64GB) , to E5-2620 (~220GB) and E5-2630v3 with 256+ GB of RAM
- environments are medium and high VM dense (counting the VM numbers as well as the VM resources)
- storage access on my standard setup is a dedicated network on min 4 gigabit NICS to 10 gbe adapters on some projects
- storage servers are all using only HW RAID with BBUs no mdadm or other slow IO arrays
- storage drives are ranging from SSDs arrays to Near_SAS arrays
- storage access is over ISCSI/NFS (jumbo frames)
- some of the setups are run over ovs deployment and not standard linux bridging
- majority of the cards are Intel based and some of them are BCMs (dell servers)

- using prox 3.4 fully updated and upgraded
- for all my installation I am going for the 3.10.0-x-pve kernels and I tent to keep an update all systems up to date frequently
-- * switched to this branch of servers like a year and a half ago when observed some strange issues with VM running 3.2x and up kernels over old stable 2.6 (only on live migration)
- currently using 3.10.0-13-pve
- using the pve-no-subscription repo as a source on all projects
- hypervisors have acpid installed and irqbalance started all the time
- hypervisors have setup the following sysctl values ( vm.swappiness = 20 / vm.dirty_background_ratio = 5 / vm.dirty_ratio = 10 )

- I ran only KVM hosts, no openvz CT for me !
- always using only virtio drivers in all VMs (NIC/storage)
- CPU emulation is set to Sandy
- Majority of the OS flavours that are virtualized are ranging from Deb 7.8/9 - 8.x, Cent6.6+/7.x, Win 7x and WinServer2012r2 (all on 64 bit)
- setup ranges included medium large database clusters (galera), elastic search, ADs, virtual routers, lbs, webservers etc, no issue whatsoever with IO or stability of any of the 5 different setups

here's a quick overview on how it looks on 3 different environments (mostly the same):

proxmox-ve-2.6.32: 3.4-165 (running kernel: 3.10.0-13-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-3.10.0-13-pve: 3.10.0-38
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-3.10.0-9-pve: 3.10.0-33
pve-kernel-2.6.32-37-pve: 2.6.32-150
pve-kernel-2.6.32-42-pve: 2.6.32-165
pve-kernel-2.6.32-38-pve: 2.6.32-155
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-2.6.32-41-pve: 2.6.32-164
pve-kernel-2.6.32-42-pve: 2.6.32-165
pve-kernel-3.10.0-12-pve: 3.10.0-37
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-11
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-3.10.0-13-pve: 3.10.0-38
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-2.6.32-41-pve: 2.6.32-164
pve-kernel-3.10.0-11-pve: 3.10.0-36
pve-kernel-3.10.0-12-pve: 3.10.0-37
pve-kernel-2.6.32-43-pve: 2.6.32-166
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-13
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Okay, so here comes the 2 things following the post's reason:

#1. Any type of VM Ubuntu installation has issues with live migration (tested on more different env) - tested mostly with 12/14/15 LTS versions (3.x + kernels)

- what happens ? -> attempting to live migrate any VM installed with Ubuntu results into all vCPUs getting 100% usage (steal/guest on htop)

all same results were encountered despite of the following changes have been taken:
* changes of the vCPU emulation types from default KVM to Sandy/Haswell/Broadwell
* changes of the HDD type emulation
* changes of the total number of sockets/cores configuration
* changes of the OS type emulation from 3.x/2.6 and Other
* changes of the kernel version (upgrade) in the VM doesn't change the behavior
* not even a strange kernel or other error/warning messages related to this into the VM nor the hypervisors involved (to correlate to the vCPUs pinning)
* load average inside VM doesn't increase to a sensible value to lock down the machine
* not even a kernel hung task timeout on the VM
* the NMS system over snmp is always collecting CPU usage of the VM to 100% usage on all VM's vCPUs after online migration (got graphs to prove)
* physical CPU usage on the hypervisor host where the VM has landed stays on the VM process id between usage of 5-10% after this
* VM process continue to serve scope as usual and normal after all this without interruption

So what's going on with this ? Am I suspecting a mismatch between the base pve kernel version and VMs running latest 3.10+ kernel branches ?
Did any one encountered this pattern before ?

#2. VM vCPUs usage iowait time 100%

- what happens ? -> (without a predictable pattern/receipt) vCPU averages to 2.00 (25-50%) load average while VM is on local disk

details:
* OS type Deb 7.9 latest fully upgraded (3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u4)
* happen to VMs with/without load - meaning idle non-idle VMs
* only on local storage !
- all other projects running similar setups and over network storage had never encountered this pattern in (3-4 years now!)
* local mountpoint directory (and VM image land) is a separate 12 x enterprise SSD array with HW RAID and BBU unit - no LVM
* mount options of the local storage is xfs rw,relatime,attr2,inode64,noquota
* hypervisor's IO/CPU/RAM/disk usage is not even at 30% usage of it's max capacity - so guest system overload is out of question
* while this happens VM already printed into the console screen - kernel hung task timeout over 120s
* occasional opening the subject VM's console doesn't start and a stop/start action is required to get access to VM while it locks-up (starting VNC handshake hangs) - VM freezes and all operations except stop/start return "qmp socket - timeout after 31 retries"
* VMs are running with vm.swappiness = 20 / vm.dirty_background_ratio = 5 / vm.dirty_ratio = 10 in sysctl.conf
* base proxmox hypervisor doesn't log anything into kernel/syslog dmes about kernel oops/panics or anything related and never locks, stops or becomes unresponsive
* this VM lock-up and CPU issue didn't manifest during VM backup tasks
* if ssh over vm is still working and only the vCPU iowait usage manifests any activities (like install a package) fails, but not listing file/dirs or navigating through the system structure and iotop doesn't report any process or kthread being stuck

Now, what changes all this situation and makes the load on the VM and its usability back to responsive and good working is live adding another disk to VM, or suspending it and unsuspeding it. Therefore I understand that qemu does some sort of VM's disk image refresh or look-up while adding a disk because this get's like a shock to the VM and everything returns to normal.

So, might this be related to the relatime option I used in xfs mountpoint or what could cause the service difference between running local and remote storage systems that might act as upper described?

Any thoughts on one of the 2 points, or even a correlation between both of them is much appreciated, so let me know if I can support by providing any other related info.

Regards,
Alex

avladulescu · Dec 3, 2015

Solution can be found at the following thread post: http://forum.proxmox.com/threads/20372-Linux-guest-problems-on-new-Haswell-EP-processors/page2?

This is for anybody having the nerves to read all this.

spirit · Dec 15, 2015

Hi, steal time bug has been fixed in last proxmox 4.1 kernel. (it's a bug with live migration, because of host kvm module kernel bug).
a backport of the patch is also available in last proxmox 3 kernel too.

Search

Search

VM high vCPU usage issues

avladulescu

Renowned Member

avladulescu

Renowned Member

spirit

Distinguished Member

We value your privacy