Linux guest problems on new Haswell-EP processors

spirit

Famous Member
Apr 2, 2010
3,555
163
83
www.odiso.com
Hi Guys,

I finally have had some strange problem with my haswells ;)

This occur with live migration only, with a specific workload.
I have a java application, which use normally 1 core of 16 vcpus.
After a live migration, for the same workload, it's jumping to 16cores usage, 1600%.
(no dmesg error, no iowait, ....)

I can't reproduce it with simple cpu or memory benchmark.

This is with pve-kernel 3.10 on the host and debian 3.16 kernel on guest.

Using pve-kernel-4.2 fix the problem. So it's a kvm bug.
 

tytanick

Member
Feb 25, 2013
96
3
8
@spirit dod you installed pve-kernel-4.2 on proxmox 3.4 and there is no problems with it ?
Can do this at night in production and it would go smoothly ?:)
 

tytanick

Member
Feb 25, 2013
96
3
8
kernel 4.x doesnt solve problem.
As i see the problem is less frequent but it happened.
So no good solution for now ?
only disable virtio and use ide ? :(
 

tytanick

Member
Feb 25, 2013
96
3
8
Below, host and guest info.
Is it kernel(host kernel or guest kernel ?) bug ?
and you know that for a fact that this was fixed in 4.2.6-1 ?


Host info:
proxmox-ve-2.6.32: 3.4-150 (running kernel: 4.2.3-2-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-3.10.0-13-pve: 3.10.0-38
pve-kernel-2.6.32-37-pve: 2.6.32-150
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-7
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-13
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


Guest:
Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u6 x86_64 GNU/Linux

i use virtio because it works better and doesnt cause system kernel error while backuping whole guest.
Do you had problems while backupping live snapshot of running linux guest on ide driver ?
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
Hi,

I always run on virtio drivers because this is the kind of driver that delivers the max iops on the virtual system, at least if this is what you're hunting for.

My guess is to go for the 4.2.6-1 kernel branch, I think I just recently upgrade to latest 33 release of the 6 minor version and don't have any issues whatsoever.

Another hint would be that you configure your VMs vCPUs - NOT like ( 2 x 2 = 4 ) but as ( 1 x 4 = 4 ) and enable numa on the CPU. As well as use the default SCSI Controller Type (the default LSI) one on the VM options tab.

Be sure always to have at least 80% of memory free on your hypervisor for VM backups and if necessary change (test too) and adapt the :

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

so you avoid using the local hypervisor RAM for unwanted caching therefore you'll going to force the sync of data more often thus avoid rare and large syncs to VM image disk.

Another thing, be sure you will re-read the whole post in deep, as you need to see if in your bios, you don't have any power saving profile enabled by the subject vendor.
 

tytanick

Member
Feb 25, 2013
96
3
8
host have 32 cores (in 2 sockets)
and the guest is set as 1 socket and 10 cores
enable numa is not checked in guest (so its disabled)
default scsi LSI is also set

so you think that this could be an issue and i sould set guest as 2 sockets * 5 cores ?
and i also ahould enable numa meaby this setting should be always as default in proxmox ?
and dirsty options should be done only in Guests right ?
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

we talk about Host kernel or guest kernel ?
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
1. 1 socket and 10 cores - that is the correct way
2. enable numa
3. I am talking about the hypervisor settings and not VMs.
4. be sure to have latest updates of qemu-kvm versions from pve repository
 

tytanick

Member
Feb 25, 2013
96
3
8
Unfortunatley it didnth helped.
I cought problem at some charts, meaby somebody will help me.
Also beaby qcow2 + virtio = bad things ?
and meaby WRITE BACK shouldnt be used ? - i have this in all my setups .... ?
Anyway on VM there was accually no load at all !

Host proxmox kernel: Linux node1 4.2.6-1-pve #1 SMP Thu Jan 21 09:34:06 CET 2016 x86_64 GNU/Linux
VM kernel: 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u2 x86_64 GNU/Linux
Sysctl config of VM:
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

upload_2016-1-27_16-43-53.png

upload_2016-1-27_16-43-40.png


upload_2016-1-27_16-39-58.png

upload_2016-1-27_16-39-21.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!