Linux guest problems on new Haswell-EP processors

Jun 15, 2014
187
7
18
Dresden
www.robhost.de
Any other clues ?
Yes, use Kernel 3.16 from Debian backports. We're running a few Haswell nodes since 2 month without any problems with this kernel. Before we had daily VM freezes.
This works only for KVM, OpenVZ ist not included in this kernel. If you're on OpenVZ, there is no fix afaik. :(
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
@ spirit - Yes, it is the default value set, hotplug enabled (Disk Network and USB), are you suggesting something ? :D

@ robhost - I said I am not using OpenVZ, as pointed out in that separate link I provided, fully describing the issue. But as on the other hand I am not running only debian to use back-ports and I have mentioned of using also the centos 6.7 with 2.6.x branch this is out of question as this envs are production ones.

I have checked the configuration of C-STATES on the Dell Server, in BIOS. It was disabled as I have on both servers set the Performance Profile to "Performance".

Reference at page 21 from Dell's documentation.

"BIOS Performance and Power TuningGuidelines for Dell PowerEdge 12thGeneration Servers"

Therefore, nanonettr post's doesn't apply as a quick look over dmesg output from boot time, searching C-STATES CPU set is not mentioned.


So as stated before, the add/remove drive snaps out the VM from staled IO freeze, without performing stop/start commands on the VM, it has to do something qemu disk management or refresh.

I am still digging to see if I can force a stupid workaround via the cronjob and a qm monitor commands sets over the disk state of the implied VMs, like doing a querying from a 5/10 minutes cycle.

Any other suggestions are appreciated.

BR
 
Jun 15, 2014
187
7
18
Dresden
www.robhost.de
@ robhost - I said I am not using OpenVZ, as pointed out in that separate link I provided, fully describing the issue. But as on the other hand I am not running only debian to use back-ports and I have mentioned of using also the centos 6.7 with 2.6.x branch this is out of question as this envs are production ones.
You missed the point to use the backported kernel on the PVE host, NOT an your VMs.
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
Where in the world did you find backports pve kernel ?

Besides the rest, I've got these 2 in source.list:

deb http://download.proxmox.com/debian wheezy pve-no-subscription
deb http://http.debian.net/debian wheezy-backports main

Are you running on the bare metal server directly the stock backports kernel from debian repo merged into a Proxmox 3.x install ? At least this is what I understand from your state.


Furthermore on this topic I found some crucial information, which I am starting to ask myself why Proxmox devels/staff are keeping low & silent on this topic?

Go over these following threads I found related to the current issue (and mind the dates too):

http://pve.proxmox.com/pipermail/pve-user/2015-May/008736.html

Still brings shadow on the final answer, but brings another argument into count - compat vers on qcow2 compat: 0.10 to compat: 1.1
Although, in my case I went installing the bare metal with 3.4.x branch CD from start and didn't upgrade from 3.1


http://pve.proxmox.com/pipermail/pve-devel/2014-October/012909.html


I am wondering why iothread setting is still being hidden since we're on the exact match of qemu versions:

ii pve-qemu-kvm 2.2-13 amd64 Full virtualization on x86 hardware
ii qemu-server 3.4-6 amd64 Qemu Server Tools

Or might this be a subscription intended paid fix to have ?
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com
Hi, I'm running 15x dell r630, with Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz process, kernel 3.10, and I don't have any problem.
All on nfs or ceph, no local disk

(running around 1000vms, debian and windows guests)
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com
@ spirit - Yes, it is the default value set, hotplug enabled (Disk Network and USB), are you suggesting something ? :D

BR
Can you try without hotplug ? (I want to known if it's occur when create a new disk on the storage, or if the action of plugging in qemu, is doing something in qemu io thread.)
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
Hi, I'm running 15x dell r630, with Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz process, kernel 3.10, and I don't have any problem.
All on nfs or ceph, no local disk

(running around 1000vms, debian and windows guests)
Still, that doesn't mean the issue doesn't exist ;)

There is a sure difference between my 730xd's (running also on local and remote storage) and yours r630's in terms of chipset and used CPUs, but I am having a hard time believing that the CPUs instruction set on my 2630's v3 is so new that it brings trouble into this.

Definitely there's a catch on the software side, as dell bios is very straight forward.
 
Jun 15, 2014
187
7
18
Dresden
www.robhost.de
Are you running on the bare metal server directly the stock backports kernel from debian repo merged into a Proxmox 3.x install ? At least this is what I understand from your state.
Yes, "3.16.0-0.bpo.4-amd64" from wheezy-backports directly on the PVE host (HP DL180gen9). 2 month without any VM freeze.
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
Thank you for sharing this very useful information !

My servers are running H730 Mini (MegaRAID SAS-3 3108 [Invader] (rev 02)), which is based, if not mistaking, on the 93xx LSI branch cards.

Although, it doesn't apply. As I described I am running 2 setup, different locations, one with local storage and one with remote storage. So the problem couldn't come only from the LSI firmware bug due to the fact that one of the setup is running on a central storage, and still the problem exist (the 2.6.x kernel - centos 6.7 - I had described before).

So, I have some servers on the x-density framework, not all of them are using local storage as source storage.
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com
Thank you for sharing this very useful information !

My servers are running H730 Mini (MegaRAID SAS-3 3108 [Invader] (rev 02)), which is based, if not mistaking, on the 93xx LSI branch cards.

Although, it doesn't apply. As I described I am running 2 setup, different locations, one with local storage and one with remote storage. So the problem couldn't come only from the LSI firmware bug due to the fact that one of the setup is running on a central storage, and still the problem exist (the 2.6.x kernel - centos 6.7 - I had described before).

So, I have some servers on the x-density framework, not all of them are using local storage as source storage.
Ok.

About the bios, do you have done last upgrade ? (because they are some cpu microcode update for intel processor)
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
Yes, All up to date, bios, firmwares, HW raid firmwares.

How can I send a PM message to you on this forum? I can't locate the PM button to contact you on a quick chat/call on private.

I wouldn't post into this forum like chat message, to spam it, and return to the thread once I get some relevant information for others to share with.

I might have some leads but, won't post them once this is a certitude fix.
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com
Yes, All up to date, bios, firmwares, HW raid firmwares.

How can I send a PM message to you on this forum? I can't locate the PM button to contact you on a quick chat/call on private.

I wouldn't post into this forum like chat message, to spam it, and return to the thread once I get some relevant information for others to share with.

I might have some leads but, won't post them once this is a certitude fix.
feel free to contact me on my work email : aderumier@odiso.com
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
As said, I returned with more info on the topic in order to shed some light on the gathered research I've manage to do so far.

Considering my last post, I have focused on the way the internal disk scheduler is set, from default values towards changing it to deadline on all VMs. This has improved the stability quite a bit, but was not enough to stop this bug from manifesting.

So, since the last post, I was constantly checking the status of the VMs via my NMS and their resource utilisation. What I could observe is that at the time the VM gets stuck, on a low loaded VM, the memory buffers and cached values start to rise pretty solid (even if the lock is cleared afterwards via the earlier add/del disk described method). While the vCPUs are in a "lock state" the host context switches, system interrupts and load average go sky rocket on the graphs and the system I/O Activity freezes completely.

These started to seem more like a memory leak, therefore the first point to start was to check what is new/different in v3 processors in comparision with older versions.
The following link provided a starting point: https://software.intel.com/en-us/blogs/2014/09/08/four-new-virtualization-technologies-on-the-latest-intel-xeon-are-you-ready-to

From all the described technologies on the site, the VMCS Shadowing provided the mostly kernel errors pages on current (in use) kernel branches. Therefore a further lookup over, reveals the "kvm_vm_ioctl" KVM kernel functions to be the central point of all sort of misbehaviours.

Below I have added a few useful links I could find related to this:

https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.47 - search for commit 264f8746aa6ebf1a62588c653a5e3c4891f69fee
http://www.gossamer-threads.com/lists/linux/kernel/2207193
http://stackoverflow.com/questions/33192729/vmwrite-error-when-updating-vmcs-from-kvm-vm-ioctl
https://bugzilla.kernel.org/show_bug.cgi?id=93251 - affecting 3.19 branch

So as I understand this (I'm not a kernel developer) the "leak" comes from the following logic:

Running a virtual machine -> allocates the corresponding configured vCPUs to the KVM process (to vCPU & ioctl) as well as setting io scheduling in kvm instance in qemu-kvm. The vCPUs are tighten to the physical CPUs & ram memory that binds on a KVM specific instruction set to open ioctl system calls. These should try to create a set of file descriptors to the current process involving the disk access role.

The lock'up only occurs when, the kvm_vm_ioctl tries to free-up some memory resources previously allocated.

What is mostly interesting is that there is no version of 3.16.x to test with in pve repository, and this bug was supposedly fixed in 3.10 and we don't know the correlation that it is between 3.10.47 and the pve-kernel-3.10.0-13-pve revision and if it includes the fix, but might explain why robhost is running stable on 3.16.x branch from backports and that supposedly got fixed in the 4.1.x kernel branch.

Regarding Spirit's CPU version in comparison to mine, it is well known the fact that each hardware CPU branch version/revision (mine entry-to-middle, his high end) has a major architecture, thus minor changes between different high-to-low gamma, whereas to the cpu microcode support included in each bios update by all hardware/mobo vendors.

Currently, I'm under stability testing with 3.16.x kernel from backports, forcing me to drop the 3.10.x pve stable kernel release (maybe until a 3.16 pve might raise - although I don't believe it so, since 3.10 is dead next year and the progress on 4.x branch is way to far ongoing on the Proxmox 4 versions for somebody to reconsider bug fixing on a dead-end kernel version/product).

Hopefully my logic and explanations are close to right and this will help others in the future.
 

pjkenned

New Member
Dec 16, 2013
26
1
3
No clues but I have seen this on two different E5 v3 processors with Proxmox VE 4.0.System 1: dual E5-2650L V3 supermicro motherboardSystem 2: dual E5-2683 V3 asrock motherboard
 

spirit

Famous Member
Apr 2, 2010
3,570
164
83
www.odiso.com
As said, I returned with more info on the topic in order to shed some light on the gathered research I've manage to do so far.

Considering my last post, I have focused on the way the internal disk scheduler is set, from default values towards changing it to deadline on all VMs. This has improved the stability quite a bit, but was not enough to stop this bug from manifesting.

So, since the last post, I was constantly checking the status of the VMs via my NMS and their resource utilisation. What I could observe is that at the time the VM gets stuck, on a low loaded VM, the memory buffers and cached values start to rise pretty solid (even if the lock is cleared afterwards via the earlier add/del disk described method). While the vCPUs are in a "lock state" the host context switches, system interrupts and load average go sky rocket on the graphs and the system I/O Activity freezes completely.

These started to seem more like a memory leak, therefore the first point to start was to check what is new/different in v3 processors in comparision with older versions.
The following link provided a starting point: https://software.intel.com/en-us/blogs/2014/09/08/four-new-virtualization-technologies-on-the-latest-intel-xeon-are-you-ready-to

From all the described technologies on the site, the VMCS Shadowing provided the mostly kernel errors pages on current (in use) kernel branches. Therefore a further lookup over, reveals the "kvm_vm_ioctl" KVM kernel functions to be the central point of all sort of misbehaviours.

Below I have added a few useful links I could find related to this:

https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.47 - search for commit 264f8746aa6ebf1a62588c653a5e3c4891f69fee
http://www.gossamer-threads.com/lists/linux/kernel/2207193
http://stackoverflow.com/questions/33192729/vmwrite-error-when-updating-vmcs-from-kvm-vm-ioctl
https://bugzilla.kernel.org/show_bug.cgi?id=93251 - affecting 3.19 branch

So as I understand this (I'm not a kernel developer) the "leak" comes from the following logic:

Running a virtual machine -> allocates the corresponding configured vCPUs to the KVM process (to vCPU & ioctl) as well as setting io scheduling in kvm instance in qemu-kvm. The vCPUs are tighten to the physical CPUs & ram memory that binds on a KVM specific instruction set to open ioctl system calls. These should try to create a set of file descriptors to the current process involving the disk access role.

The lock'up only occurs when, the kvm_vm_ioctl tries to free-up some memory resources previously allocated.

What is mostly interesting is that there is no version of 3.16.x to test with in pve repository, and this bug was supposedly fixed in 3.10 and we don't know the correlation that it is between 3.10.47 and the pve-kernel-3.10.0-13-pve revision and if it includes the fix, but might explain why robhost is running stable on 3.16.x branch from backports and that supposedly got fixed in the 4.1.x kernel branch.

Regarding Spirit's CPU version in comparison to mine, it is well known the fact that each hardware CPU branch version/revision (mine entry-to-middle, his high end) has a major architecture, thus minor changes between different high-to-low gamma, whereas to the cpu microcode support included in each bios update by all hardware/mobo vendors.

Currently, I'm under stability testing with 3.16.x kernel from backports, forcing me to drop the 3.10.x pve stable kernel release (maybe until a 3.16 pve might raise - although I don't believe it so, since 3.10 is dead next year and the progress on 4.x branch is way to far ongoing on the Proxmox 4 versions for somebody to reconsider bug fixing on a dead-end kernel version/product).

Hopefully my logic and explanations are close to right and this will help others in the future.
Nice debug . I was looking for redhat 3.10 kernel updates changelogs (because current proxmox 3.10 was not updated since may 2015), but nothing too news related to kvm has been backported by redhat.

If you need a more recent kernel than 3.16, you can try proxmox 4.0 kernel, it should work :

http://download.proxmox.com/debian/dists/jessie/pve-no-subscription/binary-amd64/pve-kernel-4.2.3-2-pve_4.2.3-22_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pve-no-subscription/binary-amd64/pve-firmware_1.1-7_all.deb
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
@ Spirit:

I can't change nor test this on to this systems because are production envs and my main concern is not to have sleepless nights due to this stupid bug, which I already had plenty.



No clues but I have seen this on two different E5 v3 processors with Proxmox VE 4.0.System 1: dual E5-2650L V3 supermicro motherboardSystem 2: dual E5-2683 V3 asrock motherboard
Separately, I have a different env running on Core i7 socket 2011 v1 CPUs and never had encountered this issue before.



If am I am to take a wild general guess, considering what and how a Linux Kernel work (that it molds on to the hardware system), I suppose v3 architecture from Intel is one step ahead of the Kernel development schedule to be consider a fully stable & supported layout, otherwise v1 imposes no trouble.
 
Jun 15, 2014
187
7
18
Dresden
www.robhost.de
If am I am to take a wild general guess, considering what and how a Linux Kernel work (that it molds on to the hardware system), I suppose v3 architecture from Intel is one step ahead of the Kernel development schedule to be consider a fully stable & supported layout, otherwise v1 imposes no trouble.
Not the kernel develpment at all, but the RHEL kernels (which PVE uses in a patched version). This ist why newer 3.16 and also 4.0 kernels do not have this problem (and PVE 4.0 also not, depending on its 4.0 kernel).
Imho upgrading to PVE 4 or using the backported kernel 3.16 or even the 4.0 from PVE 4 in PVE 3 ist the only way to fix this issue.
 

avladulescu

Member
Mar 3, 2015
25
0
21
Bucharest/Romania
@ e100 - tested every possible setting, even ISCSI, hot plug disabled, all same results. But while some are still frying the fish, other stubborn people prefer to have a solid solution to this bug and use virtio (the best performance) without any issue. :D



As a tested solution to this issue, I can confirm after testing that backports kernel 3.16 still imposed issues on the virtual machines I was running. I was still experiencing lock-ups on the VMs running 2.6 kernels (centos 6.7) and 3.16 vCPU & IO wait (debian 8.2) was misbehaving over VM transfer to another host in the cluster one one setup, but on the other hand, on another setup which is only running 7.9 VMS, the backports did help solve the issue.

As now, I can conclude that all this issues, I have posted and explained into another thread of mine ( http://forum.proxmox.com/threads/24277-VM-high-vCPU-usage-issues ) are totally gone.
Following the trial & error suggestion Spirit has done, meaning to try an upgrade all cluster nodes to 4.2.3-2-pve kernel, even if running 3.4 Proxmox version and see the outcome afterwards, has succeeded.

As a rule of thumb, I manage to identify a couple of years ago when I started find my way around how Proxmox works and what are the pros and the cons, I discovered an important fact to keep in mind:

Run the Hypervisor Host with a kernel version at least equal or same branch to the VMs that you're planing to deploy on to it.

My sanity testing implied the following kernel versions and OS types:

Centos 6.7 - 2.6.32-573
Debian 7.9 - 3.2
Debian 8.2 - 3.16
Ubuntu 14.04.1 LTS - 3.19
Ubuntu 15.10 LTS - 4.2.0-16

The testing has been done with a private test cluster but as well as with a production cluster and also Ubuntu VMs don't experience vCPUs lock-ups after this.

So as a final solution, head and install the following on your guest system:

- pve-firmware_1.1-7_all.deb
- pve-headers-4.2.3-2-pve_4.2.3-22_amd64.deb
- pve-kernel-4.2.3-2-pve_4.2.3-22_amd64.deb


Cheers
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!