Corosync Totem Re-transmission Issues

adamb · Feb 25, 2013

Hey all. I am still battling Corosync Re transmission issues. 2 of my 3 clusters seem to be having the issue. As you can see the two clusters having the issue are on the latest version. The 1 cluster not having the issue is on a previous version. I am at a loss as to what the issue could be. Doing some searching and it seems this typically happens when 1 node is under performing the other, I don't feel this is the case as cluster #1 never had this issue until I just upgraded it.

Cluster #1 (Having issue)
IBM x3550 M3's
Broadcom 10GB backend

root@proxmox1:/var/log/cluster# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-72
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

Cluster #2 (Having issue)
IBM x3650 M4's
Broadcom 10GB backend

root@medprox1:/var/log/cluster# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-72
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

Cluster #3 (Not having issues)
IBM x3650 M4's
Broadcom 10GB backend

root@fiosprox1:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

Here is my cluster.conf which is the same on each node other than IP's.

<?xml version="1.0"?>
<cluster config_version="27" name="fiosprox">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd allow_kill="0" interval="3" label="fiosprox_qdisk" master_wins="1" tko="10"/>
<totem token="54000"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.80.12.129" lanplus="1" login="USERID" name="ipmi1" passwd="PASSW0RD" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.80.12.132" lanplus="1" login="USERID" name="ipmi2" passwd="PASSW0RD" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="fiosprox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="fiosprox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="104"/>
<pvevm autostart="1" vmid="100"/>
<pvevm autostart="1" vmid="101"/>
</rm>
</cluster>

dietmar · Feb 25, 2013

To track that down you can try to test another kernel on those nodes - either the old one, or the new kernel from pvetest - maybe that helps?

adamb · Feb 25, 2013

Not comfortable with moving to the test kernel, but definitely no objections to going back to 2.6.32-16 as its been rock solid on my cluster with no issues for 3+ months.

I am using a compiled driver for my 10GB broadcom as the default driver causes issues.

So I will need to boot into the older kernel and utilize the correct compiled driver for that kernel.

Please let me know if my thoughts are correct.

1. Move 10Gb driver for 2.6.32-16-pve to /lib/modules/2.6.32-16-pve/kernel/drivers/net/bnx2x/
2. Move initrd.img-2.6.32-16-pve to initrd.img-2.6.32-16-pve.bak
3. Update-initramfs -c -k 2.6.32-16-pve
4. Update grub to boot 2.6.32-16-pve kernel, not sure how to do this?
5. Reboot

I think that should do it, I have two questions?

1. How to update grub to boot the previous kernel?

2. I will need to do this a node at a time. When one is up on 16 and the other is still on 17, will I be able to migrate VM's?

I greatly appreciate the input and advice!

adamb · Feb 25, 2013

Looks like I should be able to do something like this.

grub-set-default 1

grub-mkconfig -o /boot/grub/grub.conf

or

update-grub

Sound correct?

EDIT: I figured it out. Changed /etc/default/grub GRUB_DEFAULT=0 to GRUB_DEFAULT=1. Then ran update-grub and rebooted.

Will see if it happens more. I will keep this thread up to date.

dietmar · Feb 26, 2013

adamb said:
So I will need to boot into the older kernel and utilize the correct compiled driver for that kernel.

I only suggested to change the kernel because I thought there are issues with the bnx driver. So
that makes no sense if you compile you own module.

adamb · Feb 26, 2013

dietmar said:
I only suggested to change the kernel because I thought there are issues with the bnx driver. So
that makes no sense if you compile you own module.

Either way, if I was using the default drivers, they are identical on both kernel version 16 and 17.

On a side note, it has yet to happen after going back to 2.6.32-16-pve

tom · Feb 26, 2013

kernel 2.6.32-18 got newest bnx drivers (in pvetest).

adamb · Feb 26, 2013

tom said:
kernel 2.6.32-18 got newest bnx drivers (in pvetest).

Yep I know that, but I am a bit trigger shy as I don't want to fix an issue and have another issue arise due to a test kernel. Are there many issues with this new kernel?

tom · Feb 26, 2013

2.6.32-18 is based on latest stable OpenVZ kernel. we also added some drivers, see http://forum.proxmox.com/threads/12...VE-pvetest-repo-including-new-KVM-live-backup

I personally run this kernel on a lot of test boxes and I do not see any bad behavior.

adamb · Feb 26, 2013

tom said:
2.6.32-18 is based on latest stable OpenVZ kernel. we also added some drivers, see http://forum.proxmox.com/threads/12...VE-pvetest-repo-including-new-KVM-live-backup

I personally run this kernel on a lot of test boxes and I do not see any bad behavior.

Appreciate the input!

I am thinking, in order to do this I need to add the pve test repo to /etc/apt/sources.list.

Then "apt-get update && apt-get dist-upgrade"

dietmar · Feb 26, 2013

adamb said:
Then "apt-get update && apt-get dist-upgrade"

No. I would just update the kernel package for testing.

adamb · Feb 26, 2013

dietmar said:
No. I would just update the kernel package for testing.

Looks like it just happened again with the 2.6.32-16 kernel. At least this time the entire cluster didn't die.

Can't seem to pin down any documentation for updating just the kernel package. Any tips on how to do this? I appreciate the input!

adamb · Feb 26, 2013

Another question is whether migration will be broke when one node is on 2.6.32-17 and one node is on 2.6.32-18

tom · Feb 26, 2013

> wget http://download.proxmox.com/debian/dists/squeeze/pvetest/binary-amd64/pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb
> dpkg -i pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb

adamb · Feb 27, 2013

tom said:
> wget http://download.proxmox.com/debian/.../pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb
> dpkg -i pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb

Appreciate the tip!

I just moved both nodes into the same server room, to eliminate the 10GB run. I now have a pre-made 10 foot 10GB Cat6A cable which should eliminate any question of the run. It then happened a few minutes later after moving the node.

So my next step is to give the test kernel a try. Both nodes are up on 2.6.32-18, hopefully this does the trick.

adamb · Feb 28, 2013

Stumbled upon the proxmox multicast notes. Decided to test my multicast traffic.

Quite interesting. My backend network (DRBD/Cluster Communication) doesn't seem to be doing multicast. I wonder if this is part of my issue.

root@proxmox2:~# asmping 224.0.2.1 10.211.47.1
asmping joined (S,G) = (*,224.0.2.234)
pinging 10.211.47.1 from 10.211.47.2
unicast from 10.211.47.1, seq=1 dist=0 time=0.235 ms
unicast from 10.211.47.1, seq=2 dist=0 time=0.204 ms
unicast from 10.211.47.1, seq=3 dist=0 time=0.219 ms
unicast from 10.211.47.1, seq=4 dist=0 time=0.104 ms
unicast from 10.211.47.1, seq=5 dist=0 time=0.227 ms
unicast from 10.211.47.1, seq=6 dist=0 time=0.236 ms
unicast from 10.211.47.1, seq=7 dist=0 time=0.194 ms
unicast from 10.211.47.1, seq=8 dist=0 time=0.232 ms
unicast from 10.211.47.1, seq=9 dist=0 time=0.186 ms
unicast from 10.211.47.1, seq=10 dist=0 time=0.252 ms
unicast from 10.211.47.1, seq=11 dist=0 time=0.235 ms

Where as my LAN network which I use for management has no problem with multicast.

root@proxmox2:~# asmping 224.0.2.1 10.80.12.125
asmping joined (S,G) = (*,224.0.2.234)
pinging 10.80.12.125 from 10.80.12.130
unicast from 10.80.12.125, seq=1 dist=0 time=1.218 ms
multicast from 10.80.12.125, seq=1 dist=0 time=1.236 ms
unicast from 10.80.12.125, seq=2 dist=0 time=0.287 ms
multicast from 10.80.12.125, seq=2 dist=0 time=0.272 ms
unicast from 10.80.12.125, seq=3 dist=0 time=0.268 ms
multicast from 10.80.12.125, seq=3 dist=0 time=0.253 ms
unicast from 10.80.12.125, seq=4 dist=0 time=0.256 ms
multicast from 10.80.12.125, seq=4 dist=0 time=0.242 ms
unicast from 10.80.12.125, seq=5 dist=0 time=0.171 ms
multicast from 10.80.12.125, seq=5 dist=0 time=0.155 ms
unicast from 10.80.12.125, seq=6 dist=0 time=0.189 ms
multicast from 10.80.12.125, seq=6 dist=0 time=0.170 ms
unicast from 10.80.12.125, seq=7 dist=0 time=0.391 ms
multicast from 10.80.12.125, seq=7 dist=0 time=0.409 ms
unicast from 10.80.12.125, seq=8 dist=0 time=0.277 ms
multicast from 10.80.12.125, seq=8 dist=0 time=0.261 ms

adamb · Feb 28, 2013

I decided to test multicast on my dedicated DRBD/Cluster network. To my surprise multicast is broke on the 10GB DRBD/Cluster Network. This is a dedicated 10GB network with absolutely no switch in between From what I have read multicast issues are typically caused by a switch, which obviously isn't my case. My next questions is how my cluster have continued to even operate without multicast traffic.

root@proxmox2:~# asmping 224.0.2.1 10.211.47.1
asmping joined (S,G) = (*,224.0.2.234)
pinging 10.211.47.1 from 10.211.47.2
unicast from 10.211.47.1, seq=1 dist=0 time=0.156 ms
unicast from 10.211.47.1, seq=2 dist=0 time=0.111 ms
unicast from 10.211.47.1, seq=3 dist=0 time=0.193 ms
unicast from 10.211.47.1, seq=4 dist=0 time=0.209 ms
unicast from 10.211.47.1, seq=5 dist=0 time=0.219 ms
unicast from 10.211.47.1, seq=6 dist=0 time=0.147 ms

root@proxmox2:~# asmping 224.0.2.1 10.80.12.125
asmping joined (S,G) = (*,224.0.2.234)
pinging 10.80.12.125 from 10.80.12.130
unicast from 10.80.12.125, seq=1 dist=0 time=1.363 ms
multicast from 10.80.12.125, seq=1 dist=0 time=1.342 ms
unicast from 10.80.12.125, seq=2 dist=0 time=0.301 ms
multicast from 10.80.12.125, seq=2 dist=0 time=0.282 ms
unicast from 10.80.12.125, seq=3 dist=0 time=0.183 ms
multicast from 10.80.12.125, seq=3 dist=0 time=0.198 ms
unicast from 10.80.12.125, seq=4 dist=0 time=0.216 ms
multicast from 10.80.12.125, seq=4 dist=0 time=0.197 ms

<?xml version="1.0"?>
<cluster config_version="10" name="proxmox">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd allow_kill="0" interval="3" label="proxmox_qdisk" master_wins="1" tko="10"/>
<totem token="54000"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.80.12.126" lanplus="1" login="USERID" name="ipmi1" passwd="PASSW0RD" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.80.12.131" lanplus="1" login="USERID" name="ipmi2" passwd="PASSW0RD" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="proxmox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="105"/>
<pvevm autostart="1" vmid="103"/>
<pvevm autostart="1" vmid="101"/>
<pvevm autostart="1" vmid="100"/>
<pvevm autostart="1" vmid="104"/>
</rm>
</cluster>

root@proxmox1:~# pvecm s
Version: 6.2.0
Config Version: 10
Cluster Name: proxmox
Cluster Id: 14330
Cluster Member: Yes
Cluster Generation: 652
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 0
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 8
Flags:
Ports Bound: 0 177 178
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.55.50
Node addresses: 10.211.47.1

root@proxmox2:~# pvecm s
Version: 6.2.0
Config Version: 10
Cluster Name: proxmox
Cluster Id: 14330
Cluster Member: Yes
Cluster Generation: 652
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 8
Flags:
Ports Bound: 0 177 178
Node name: proxmox2
Node ID: 2
Multicast addresses: 239.192.55.50
Node addresses: 10.211.47.2

On a side note, I attempted to make a new post for this, but I guess it has to be approved by an administrator?

adamb · Feb 28, 2013

I was thinking it could be due to the fact that my quorum disk is on the LAN network, but once this disk is presented over iscsi, the system sees it as a local disk. I don't think its this, I am hoping someone can provide some input.

From what I can tell, I should have no issues with multicast on the 10GB network.

vmbr0 Link encap:Ethernet HWaddr 00:10:18:d6:06:b0
inet addr:10.211.47.1 Bcast:10.211.47.255 Mask:255.255.255.0
inet6 addr: fe80::210:18ff:fed6:6b0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18276083 errors:0 dropped:0 overruns:0 frame:0
TX packets:21017046 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:115801035617 (107.8 GiB) TX bytes:22204241451 (20.6 GiB)

vmbr2 Link encap:Ethernet HWaddr e4:1f:13:e6:f6:bc
inet addr:10.80.12.125 Bcast:10.80.255.255 Mask:255.255.0.0
inet6 addr: fe80::e61f:13ff:fee6:f6bc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4209214 errors:0 dropped:0 overruns:0 frame:0
TX packets:1617249 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:724048629 (690.5 MiB) TX bytes:154288843 (147.1 MiB)

adamb · Feb 28, 2013

Very odd, asmping reports that multicast is broke, but tcpdump is showing the traffic on my 10.211.47.x network. I guess asmping is not that great of a test.

root@proxmox1:~# tcpdump -i eth0 port 5405
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:36:52.822626 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 75
15:36:52.822683 IP proxmox1.com.5404 > 10.211.47.2.5405: UDP, length 107
15:36:52.822780 IP 10.211.47.2.5404 > proxmox1.com.5405: UDP, length 107
15:36:52.822841 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 1473
15:36:52.822871 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 1473
15:36:52.822907 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 1473
15:36:52.822935 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 1473
15:36:52.822951 IP proxmox1.com.5404 > 239.192.55.50.5405: UDP, length 566
15:36:52.822965 IP proxmox1.com.5404 > 10.211.47.2.5405: UDP, length 107
15:36:52.823167 IP 10.211.47.2.5404 > proxmox1.com.5405: UDP, length 107
15:36:52.823208 IP proxmox1.com.5404 > 10.211.47.2.5405: UDP, length 107
15:36:52.823325 IP 10.211.47.2.5404 > proxmox1.com.5405: UDP, length 107
15:36:52.823369 IP proxmox1.com.5404 > 10.211.47.2.5405: UDP, length 107
15:36:52.823501 IP 10.211.47.2.5404 > proxmox1.com.5405: UDP, length 107
15:36:52.823533 IP proxmox1.com.5404 > 10.211.47.2.5405: UDP, length 107

adamb · Mar 1, 2013

Found a few more details in the log. Sometimes when this happens the nodes are able to survive and contiue, which from my understanding should be the normal procedure. I see the following lines though when things go real bad and nodes get fenced.

Feb 27 12:02:11 proxmox2 corosync[2017]: [TOTEM ] Retransmit List: 391 393 382 383 384 385 386 387 388 389 38a 38b 38c 38d 38e 38f 390 392
Feb 27 12:02:11 proxmox2 corosync[2017]: [TOTEM ] Retransmit List: 392 381 382 383 384 385 386 387 388 389 38a 38b 38c 38d 38e 38f 390 391 393
Feb 27 12:02:11 proxmox2 corosync[2017]: [TOTEM ] FAILED TO RECEIVE
Feb 27 12:02:13 proxmox2 dlm_controld[2315]: cluster is down, exiting
Feb 27 12:02:13 proxmox2 qdiskd[2083]: cman_dispatch: Host is down
Feb 27 12:02:13 proxmox2 dlm_controld[2315]: daemon cpg_dispatch error 2
Feb 27 12:02:13 proxmox2 pmxcfs[1809]: [quorum] crit: quorum_dispatch failed: 2
Feb 27 12:02:13 proxmox2 fenced[2296]: cluster is down, exiting
Feb 27 12:02:13 proxmox2 fenced[2296]: daemon cpg_dispatch error 2
Feb 27 12:02:13 proxmox2 fenced[2296]: cpg_dispatch error 2
Feb 27 12:02:13 proxmox2 rgmanager[2457]: #67: Shutting down uncleanly
Feb 27 12:02:13 proxmox2 pmxcfs[1809]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Feb 27 12:02:13 proxmox2 pmxcfs[1809]: [confdb] crit: confdb_dispatch failed: 2
Feb 27 12:02:13 proxmox2 qdiskd[2083]: cman_dispatch: Host is down
Feb 27 12:02:13 proxmox2 qdiskd[2083]: Halting qdisk operations
Feb 27 12:02:14 proxmox2 pvevm: <root@pam> starting task UPIDroxmox2:000029CF:00021723:512E3C16:qmshutdown:105:root@pam:
Feb 27 12:02:14 proxmox2 task UPIDroxmox2:000029CF:00021723:512E3C16:qmshutdown:105:root@pam:: shutdown VM 105: UPIDroxmox2:000029CF:00021723:512E3C16:qmshutdown:105:root@pam:
Feb 27 12:02:15 proxmox2 pmxcfs[1809]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Feb 27 12:02:15 proxmox2 pmxcfs[1809]: [dcdb] crit: cpg_dispatch failed: 2
Feb 27 12:02:16 proxmox2 pmxcfs[1809]: [status] crit: cpg_send_message failed: 2
Feb 27 12:02:16 proxmox2 pmxcfs[1809]: [status] crit: cpg_send_message failed: 2
Feb 27 12:02:17 proxmox2 pmxcfs[1809]: [dcdb] crit: cpg_leave failed: 2
Feb 27 12:02:17 proxmox2 kernel: dlm: closing connection to node 1
Feb 27 12:02:17 proxmox2 kernel: dlm: closing connection to node 2
Feb 27 12:02:17 proxmox2 kernel: dlm: rgmanager: no userland control daemon, stopping lockspace
Feb 27 12:02:18 proxmox2 pmxcfs[1809]: [status] crit: cpg_send_message failed: 2
Feb 27 12:02:18 proxmox2 pmxcfs[1809]: [status] crit: cpg_send_message failed: 2

From everything I read this typically results from a switch issue. There is no switch inbetween, its a direct 10GB connection.

Corosync Totem Re-transmission Issues

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member