Multicast problem solved enabling promisc mode

Xabi · Apr 25, 2016

I have a three-node cluster that suddenly started losing quorum. Restarting some services (pve-cluster, cman, etc) worked only for 3-5 minutes. It lost quorum time after time.

After checking multicast traffic with omping [1] I saw that multicast worked fine at high packet rates:

Code:

root@prox:~/scripts# omping -c 10000 -i 0.001 -F -q 192.168.1.1 192.168.1.2 192.168.1.3
192.168.1.2 : waiting for response msg
192.168.1.3 : waiting for response msg
192.168.1.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : waiting for response msg
192.168.1.2 : server told us to stop
192.168.1.3 : given amount of query messages was sent

192.168.1.2 :   unicast, xmt/rcv/%loss = 9534/9534/0%, min/avg/max/std-dev = 0.052/0.132/1.145/0.030
192.168.1.2 : multicast, xmt/rcv/%loss = 9534/9533/0% (seq>=2 0%), min/avg/max/std-dev = 0.066/0.142/1.152/0.032
192.168.1.3 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.030/0.078/1.045/0.027
192.168.1.3 : multicast, xmt/rcv/%loss = 10000/9999/0% (seq>=2 0%), min/avg/max/std-dev = 0.036/0.084/1.054/0.027

But I was losing multicast packets in a long test:

Code:

root@prox:~# omping -c 600 -i 1 -q 192.168.1.1 192.168.1.2 192.168.1.3
192.168.1.2 : waiting for response msg
192.168.1.3 : waiting for response msg
192.168.1.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : given amount of query messages was sent
192.168.1.3 : given amount of query messages was sent

192.168.1.2 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.163/1.017/0.045
192.168.1.2 : multicast, xmt/rcv/%loss = 600/264/56%, min/avg/max/std-dev = 0.098/0.175/0.304/0.033
192.168.1.3 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.051/0.100/0.559/0.031
192.168.1.3 : multicast, xmt/rcv/%loss = 600/264/56%, min/avg/max/std-dev = 0.058/0.113/0.566/0.041

I tried to disable/enable/change IGMP snooping configuration in my switch but I didn't see any change. Omping started to lose multicast packets after 3-4 minutes:

Code:

    ...

    prox1.mydomain.com : multicast, seq=253, size=69 bytes, dist=0, time=0.190ms
    prox2.mydomain.com    : multicast, seq=251, size=69 bytes, dist=0, time=0.172ms
    prox1.mydomain.com :   unicast, seq=254, size=69 bytes, dist=0, time=0.161ms
    prox2.mydomain.com    : multicast, seq=252, size=69 bytes, dist=0, time=0.155ms
    prox2.mydomain.com    :   unicast, seq=252, size=69 bytes, dist=0, time=0.153ms
    prox1.mydomain.com : multicast, seq=254, size=69 bytes, dist=0, time=0.209ms
    prox2.mydomain.com    :   unicast, seq=253, size=69 bytes, dist=0, time=0.171ms
    prox2.mydomain.com    : multicast, seq=253, size=69 bytes, dist=0, time=0.179ms
    prox1.mydomain.com :   unicast, seq=255, size=69 bytes, dist=0, time=0.262ms
    prox1.mydomain.com : multicast, seq=255, size=69 bytes, dist=0, time=0.310ms
    prox1.mydomain.com :   unicast, seq=256, size=69 bytes, dist=0, time=0.130ms
    prox2.mydomain.com    :   unicast, seq=254, size=69 bytes, dist=0, time=0.172ms
    prox1.mydomain.com : multicast, seq=256, size=69 bytes, dist=0, time=0.178ms
    prox2.mydomain.com    : multicast, seq=254, size=69 bytes, dist=0, time=0.221ms
    prox1.mydomain.com :   unicast, seq=257, size=69 bytes, dist=0, time=0.116ms
    prox2.mydomain.com    :   unicast, seq=255, size=69 bytes, dist=0, time=0.129ms
    prox1.mydomain.com : multicast, seq=257, size=69 bytes, dist=0, time=0.164ms
    prox2.mydomain.com    : multicast, seq=255, size=69 bytes, dist=0, time=0.178ms
    prox1.mydomain.com :   unicast, seq=258, size=69 bytes, dist=0, time=0.151ms
    prox2.mydomain.com    :   unicast, seq=256, size=69 bytes, dist=0, time=0.147ms
    prox1.mydomain.com : multicast, seq=258, size=69 bytes, dist=0, time=0.199ms
    prox2.mydomain.com    : multicast, seq=256, size=69 bytes, dist=0, time=0.196ms
    prox1.mydomain.com :   unicast, seq=259, size=69 bytes, dist=0, time=0.119ms
    prox2.mydomain.com    :   unicast, seq=257, size=69 bytes, dist=0, time=0.127ms
    prox1.mydomain.com : multicast, seq=259, size=69 bytes, dist=0, time=0.167ms
    prox2.mydomain.com    : multicast, seq=257, size=69 bytes, dist=0, time=0.176ms    ===> LAST MULTICAST PACKET
    prox1.mydomain.com :   unicast, seq=260, size=69 bytes, dist=0, time=0.158ms
    prox2.mydomain.com    :   unicast, seq=258, size=69 bytes, dist=0, time=0.152ms
    prox1.mydomain.com :   unicast, seq=261, size=69 bytes, dist=0, time=0.129ms
    prox2.mydomain.com    :   unicast, seq=259, size=69 bytes, dist=0, time=0.155ms
    prox2.mydomain.com    :   unicast, seq=260, size=69 bytes, dist=0, time=0.143ms
    prox1.mydomain.com :   unicast, seq=262, size=69 bytes, dist=0, time=0.200ms
    prox1.mydomain.com :   unicast, seq=263, size=69 bytes, dist=0, time=0.121ms
    prox2.mydomain.com    :   unicast, seq=261, size=69 bytes, dist=0, time=0.135ms
    prox1.mydomain.com :   unicast, seq=264, size=69 bytes, dist=0, time=0.116ms
    prox2.mydomain.com    :   unicast, seq=262, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=265, size=69 bytes, dist=0, time=0.133ms
    prox2.mydomain.com    :   unicast, seq=263, size=69 bytes, dist=0, time=0.134ms
    prox1.mydomain.com :   unicast, seq=266, size=69 bytes, dist=0, time=0.127ms
    prox2.mydomain.com    :   unicast, seq=264, size=69 bytes, dist=0, time=0.160ms
    prox1.mydomain.com :   unicast, seq=267, size=69 bytes, dist=0, time=0.125ms
    prox2.mydomain.com    :   unicast, seq=265, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=268, size=69 bytes, dist=0, time=0.112ms
    prox2.mydomain.com    :   unicast, seq=266, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=269, size=69 bytes, dist=0, time=0.137ms
    prox2.mydomain.com    :   unicast, seq=267, size=69 bytes, dist=0, time=0.148ms
    prox1.mydomain.com :   unicast, seq=270, size=69 bytes, dist=0, time=0.145ms
    prox2.mydomain.com    :   unicast, seq=268, size=69 bytes, dist=0, time=0.151ms
    prox1.mydomain.com :   unicast, seq=271, size=69 bytes, dist=0, time=0.145ms

Finally, I found the solution [2]. Enabling promiscuous mode on my bridge interface (in every node) solves the problem:

Code:

ip link set vmbr0 promisc on

Now I can see I don't lose any multicast packets and my cluster's quorum is stable. So, apparently, it wasn't the switch but the proxmox itself...

Now, the question is: why? is there any better solution?

Thanks!!

1: https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues
2: https://forum.proxmox.com/threads/pvecm-status-activity-blocked.26910/#post-135333

fireon · Apr 25, 2016

In the past (pve3) we had similar. I don't really know why... we changed on the advice of the PVE support to a seperatly vlan only for the clustercommunication, and yes we have done not only vlan, also a physical extra nic bonding for the clustercommunication. After that we never hat such problems. Maybe it is a little bit sensitive.

But this was on version 3.x. On version 4.x we haven't tested without extra vlan. Every cluster has his one phy and vlan for the communication.

Xabi · Apr 26, 2016

Thanks for your reply,

My cluster is on PVE3.4 and the nodes have the following network configuration:

A bonding (bond0) formed by two Gigabit interfaces (eth0 + eth1), using LACP (802.3ad)
In the switch I have an untagged VLAN (e.g. ID 6) that I use it not exclusively for cluster communication but also for some of the VM inside Proxmox.
I permit all tagged VLAN so I can configure any VM with the network configuration I need

I would like to migrate my cluster to PVE4 so maybe I will try using a separated VLAN / Network interfaces just for the cluster networking.

fireon · Apr 27, 2016

Yes, use a seperate VLAN for the cluster, works here without problems. https://pve.proxmox.com/wiki/Upgrade_from_3.x_to_4.0

Search

Search

Multicast problem solved enabling promisc mode

Xabi

New Member

fireon

Distinguished Member

Xabi

New Member

fireon

Distinguished Member

We value your privacy