Thanks for the suggestion
Not using any additional drivers and in my case only debian wheezy guests seem to have a problem.
Not using any additional drivers and in my case only debian wheezy guests seem to have a problem.
Thanks for the suggestion
Not using any additional drivers and in my case only debian wheezy guests seem to have a problem.
Do you have tried to use kernel 3.16 from wheezy-backports ?
I'm running it without any hang on more than 400 guests vms
Jan 13 03:19:28 vm5 corosync[3982]: [TOTEM ] Retransmit List: 15a90b 15a90d 15a90f
Jan 13 03:19:28 vm5 corosync[3982]: [TOTEM ] Retransmit List: 15a90d
Jan 13 03:19:28 vm5 corosync[3982]: [TOTEM ] Retransmit List: 15a911 15a913
Jan 13 03:19:28 vm5 corosync[3982]: [TOTEM ] Retransmit List: 15a911
Jan 13 03:19:28 vm5 corosync[3982]: [TOTEM ] Retransmit List: 15a915
Interesting, I have not tried that kernel in the guests.
Currently I am testing if changing from virtio-blk to IDE resolves the issue.
Looks promising so far but is not yet conclusive.
You changed it inside the guest vm ?
Hi,...
I suspect this is related to the IPoIB kernel changes that I pointed out earlier.
DRBD get timeouts with IPoIB resulting in split brains when using 3.10 from repo or the beta kernel you provided.
corosync has non-stop re-transmits with 3.10 beta and IPoIB
...
I've not had DRBD issues with 2.6.32, even running verify.
But 3.10 and the beta 3.10 kernel spirit provided have been noting but problems on some, but not all, servers.
I had started another thread about that but it never went anywhere.
I only mentioned it here to provide feedback to spirit on the beta kernel.
3.10 will be coming someday weather I want it or not and for IPoIB it seems to be unusable.
From what I read all the IB drivers need to use GFP_NOIO when allocating memory to prevent a deadlock with IPoIB.
Only mlx4 has been updated so far and I've not seen any activity related to fixing the others.
http://lkml.org/lkml/2014/4/24/543
Hi,
any info on mellanox website about rhel7 support ? because the 3.10 kernel is the rhel7 kernel.
(my beta kernel is based on rhel7.1beta)
The only solution I have found is to not use virtio for the disk. When using IDE this problem never happens.
I've not tried SATA....
I find it most strange that I only see this on VMs with very little disk IO. VMs running nothing but memcached have the problem where a busy web server constantly writing logs never had the problem.
Wish I had a way to trigger the problem on demand, that would help to identify the cause.
Since then, we don't had this error anymore. I would like to know, if this problems exists with latest Intel Xeon v3 CPUs ?