Linux guest problems on new Haswell-EP processors

e100

Renowned Member
Nov 6, 2010
1,268
46
88
Columbus, Ohio
ulbuilder.wordpress.com
We recently upgraded two servers with dual socket Xeon boards.
One server has two E5-2687W v3
The other has two E5-2620 v3

I have three debian wheezy guests, one on the 2687W and two on the 2620 that have had issues.
These guests are currently kicked out of production so they just sit there idle al day.
The only real load is a cron job that runs every few minutes, it makes some http requests and reads/writes some tiny files.

The only clue I have is some kernel message in the guest about jbd2/dm-0-8 being blocked for more than 120 seconds.
I don't have the exact error but was something like "INFO: task jbd2/dm-0-8 blocked for more than 120 seconds."
IO becomes stalled and load keeps rising.
Only way to recover is to stop/start the VM.

Guests worked fine before the upgrade.
The only components changed where CPU/RAM/Motherboard
Still using same RAID card and disks.

Storage is LVM over DRBD.

Oddly no issues with Windows guests, so far.

Any suggestions?

VM config file:
Code:
# cat /etc/pve/qemu-server/107.conf 
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom
memory: 1280
name: XXXXXXXXXXX
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr10
onboot: 1
ostype: l26
sockets: 1
virtio0: vm9-vm10:vm-107-disk-1,cache=directsync,size=3G


Code:
# pveversion -v
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-139
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-18-pve: 2.6.32-88
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
What do you have the processor type set to? Does changing it help?
 
More information...

When the IO stalls it does not seem to be a problem with the guest OS, it seems to be an issue with KVM itself.

I tried to reset the VM by pressing reset in the GUI.
VM did not reset.

Then I entered the monitor tab and entered 'help', the response is:
Code:
Type 'help' for help.
# help
ERROR: VM 107 qmp command 'human-monitor-command' failed - unable to connect to VM 107 socket - timeout after 31 retries

Only way to recover is to stop/start the VM.
 
Yes, I think it could help, I known that kvm module in 3.10 have some cpu filtering bugs corrected.
(I have see that mainly on live migration between old and new xeons).

So, try it to compare, maybe it'll work. (Don't have haswell-ep yet to test on my side)
 
Spoke too soon.

This problem still occurs on 3.10 but with much less frequency.

KVM itself is hanging, not the guest.
Monitor does not work, cannot perform a reset.
To recover I have to stop, then start the VM.

Is there anything I can do to help track down the source of this problem?
 
Hi,

Can you try to disable apicv,

I have see bug reports about it recently (including rhel7 3.10 kernel), with last xeons processors

# modprobe kvm_intel enable_apicv=N
cat /sys/module/kvm_intel/parameters/enable_apicv to verify
 
Sure, I will try turning off apicv.

I've been playing with various IO options with my new SSDs. When I was testing iothreads if I set cache=directsync I experienced IO stalls. Nearly all of my VMs use directsync. Most likely not related to the issue here but I have set some of my VMs to writethrough to see if it makes a difference.

I've also been having issues with DRBD on 3.10. Seems like the IO scheduler is working very different resulting in timeouts causing DRBD to disconnect.
 
spirit,

Turning off APICv does not resolve the problem.
Any other suggestions? I am completely out of ideas on what might resolve this.

I have build a new kernel based on coming rhel 7.1-beta kernel

deb are here:

http://odisoweb1.odiso.net/kernel/

maybe it'll help you ?

Merry Xmas ;)
 
Hi spirit,

I installed the kernel you provided yesterday.
So far no VM lock ups but its not been long enough to conclude the issue is resolved.

But I am concerned that the kernel is spitting out lots of warnings like this:
Code:
[   42.594700] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[   42.597066] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[   42.597528] ib1: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[   42.599691] ib1: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[   42.603214] ib1: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[   42.604830] ib1: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
Those repeat every minute or so.

Seems related to this patch:
http://permalink.gmane.org/gmane.linux.drivers.rdma/20239

I'm no kernel hacker so this is a bit above me but it appears that hardware drivers also need patched to work with the above ipoib patch:
> > mthca are similar to mlx4 and qib does vmalloc() in qib_create_qp()).
> > So this patch needs to be extended to the other 4 IB device drivers in
> > the tree.
http://lkml.org/lkml/2014/4/24/543

My IB cards use the mthca driver
Code:
[    8.190786] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)

The changes are to prevent a deadlock, I have been having some issues with DRBD timing out under load on machines running the 3.10 kernel, Wonder if this is related.
 
Right after posing my message a colleague sent me this screen shot.
As you can see many tasks are stalled, there are no other messages before these, goes from working fine to spitting out these errors with stalled IO.

View attachment 2394

I had hung_task error frequently on one of my E5-2620v2. My problem was tinkering with Infiniband. In my case entire node would lock up and only way to clear was hard reboot. I removed additional IB drivers i installed and have not seen this error for about 2 months now. i am not on the new Kernel 3.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!