Multicast problem solved enabling promisc mode

Xabi

New Member
Apr 25, 2016
5
0
1
42
I have a three-node cluster that suddenly started losing quorum. Restarting some services (pve-cluster, cman, etc) worked only for 3-5 minutes. It lost quorum time after time.

After checking multicast traffic with omping [1] I saw that multicast worked fine at high packet rates:

Code:
root@prox:~/scripts# omping -c 10000 -i 0.001 -F -q 192.168.1.1 192.168.1.2 192.168.1.3
192.168.1.2 : waiting for response msg
192.168.1.3 : waiting for response msg
192.168.1.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : waiting for response msg
192.168.1.2 : server told us to stop
192.168.1.3 : given amount of query messages was sent

192.168.1.2 :   unicast, xmt/rcv/%loss = 9534/9534/0%, min/avg/max/std-dev = 0.052/0.132/1.145/0.030
192.168.1.2 : multicast, xmt/rcv/%loss = 9534/9533/0% (seq>=2 0%), min/avg/max/std-dev = 0.066/0.142/1.152/0.032
192.168.1.3 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.030/0.078/1.045/0.027
192.168.1.3 : multicast, xmt/rcv/%loss = 10000/9999/0% (seq>=2 0%), min/avg/max/std-dev = 0.036/0.084/1.054/0.027

But I was losing multicast packets in a long test:

Code:
root@prox:~# omping -c 600 -i 1 -q 192.168.1.1 192.168.1.2 192.168.1.3
192.168.1.2 : waiting for response msg
192.168.1.3 : waiting for response msg
192.168.1.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.1.2 : given amount of query messages was sent
192.168.1.3 : given amount of query messages was sent

192.168.1.2 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.163/1.017/0.045
192.168.1.2 : multicast, xmt/rcv/%loss = 600/264/56%, min/avg/max/std-dev = 0.098/0.175/0.304/0.033
192.168.1.3 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.051/0.100/0.559/0.031
192.168.1.3 : multicast, xmt/rcv/%loss = 600/264/56%, min/avg/max/std-dev = 0.058/0.113/0.566/0.041

I tried to disable/enable/change IGMP snooping configuration in my switch but I didn't see any change. Omping started to lose multicast packets after 3-4 minutes:

Code:
    ...

    prox1.mydomain.com : multicast, seq=253, size=69 bytes, dist=0, time=0.190ms
    prox2.mydomain.com    : multicast, seq=251, size=69 bytes, dist=0, time=0.172ms
    prox1.mydomain.com :   unicast, seq=254, size=69 bytes, dist=0, time=0.161ms
    prox2.mydomain.com    : multicast, seq=252, size=69 bytes, dist=0, time=0.155ms
    prox2.mydomain.com    :   unicast, seq=252, size=69 bytes, dist=0, time=0.153ms
    prox1.mydomain.com : multicast, seq=254, size=69 bytes, dist=0, time=0.209ms
    prox2.mydomain.com    :   unicast, seq=253, size=69 bytes, dist=0, time=0.171ms
    prox2.mydomain.com    : multicast, seq=253, size=69 bytes, dist=0, time=0.179ms
    prox1.mydomain.com :   unicast, seq=255, size=69 bytes, dist=0, time=0.262ms
    prox1.mydomain.com : multicast, seq=255, size=69 bytes, dist=0, time=0.310ms
    prox1.mydomain.com :   unicast, seq=256, size=69 bytes, dist=0, time=0.130ms
    prox2.mydomain.com    :   unicast, seq=254, size=69 bytes, dist=0, time=0.172ms
    prox1.mydomain.com : multicast, seq=256, size=69 bytes, dist=0, time=0.178ms
    prox2.mydomain.com    : multicast, seq=254, size=69 bytes, dist=0, time=0.221ms
    prox1.mydomain.com :   unicast, seq=257, size=69 bytes, dist=0, time=0.116ms
    prox2.mydomain.com    :   unicast, seq=255, size=69 bytes, dist=0, time=0.129ms
    prox1.mydomain.com : multicast, seq=257, size=69 bytes, dist=0, time=0.164ms
    prox2.mydomain.com    : multicast, seq=255, size=69 bytes, dist=0, time=0.178ms
    prox1.mydomain.com :   unicast, seq=258, size=69 bytes, dist=0, time=0.151ms
    prox2.mydomain.com    :   unicast, seq=256, size=69 bytes, dist=0, time=0.147ms
    prox1.mydomain.com : multicast, seq=258, size=69 bytes, dist=0, time=0.199ms
    prox2.mydomain.com    : multicast, seq=256, size=69 bytes, dist=0, time=0.196ms
    prox1.mydomain.com :   unicast, seq=259, size=69 bytes, dist=0, time=0.119ms
    prox2.mydomain.com    :   unicast, seq=257, size=69 bytes, dist=0, time=0.127ms
    prox1.mydomain.com : multicast, seq=259, size=69 bytes, dist=0, time=0.167ms
    prox2.mydomain.com    : multicast, seq=257, size=69 bytes, dist=0, time=0.176ms    ===> LAST MULTICAST PACKET
    prox1.mydomain.com :   unicast, seq=260, size=69 bytes, dist=0, time=0.158ms
    prox2.mydomain.com    :   unicast, seq=258, size=69 bytes, dist=0, time=0.152ms
    prox1.mydomain.com :   unicast, seq=261, size=69 bytes, dist=0, time=0.129ms
    prox2.mydomain.com    :   unicast, seq=259, size=69 bytes, dist=0, time=0.155ms
    prox2.mydomain.com    :   unicast, seq=260, size=69 bytes, dist=0, time=0.143ms
    prox1.mydomain.com :   unicast, seq=262, size=69 bytes, dist=0, time=0.200ms
    prox1.mydomain.com :   unicast, seq=263, size=69 bytes, dist=0, time=0.121ms
    prox2.mydomain.com    :   unicast, seq=261, size=69 bytes, dist=0, time=0.135ms
    prox1.mydomain.com :   unicast, seq=264, size=69 bytes, dist=0, time=0.116ms
    prox2.mydomain.com    :   unicast, seq=262, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=265, size=69 bytes, dist=0, time=0.133ms
    prox2.mydomain.com    :   unicast, seq=263, size=69 bytes, dist=0, time=0.134ms
    prox1.mydomain.com :   unicast, seq=266, size=69 bytes, dist=0, time=0.127ms
    prox2.mydomain.com    :   unicast, seq=264, size=69 bytes, dist=0, time=0.160ms
    prox1.mydomain.com :   unicast, seq=267, size=69 bytes, dist=0, time=0.125ms
    prox2.mydomain.com    :   unicast, seq=265, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=268, size=69 bytes, dist=0, time=0.112ms
    prox2.mydomain.com    :   unicast, seq=266, size=69 bytes, dist=0, time=0.126ms
    prox1.mydomain.com :   unicast, seq=269, size=69 bytes, dist=0, time=0.137ms
    prox2.mydomain.com    :   unicast, seq=267, size=69 bytes, dist=0, time=0.148ms
    prox1.mydomain.com :   unicast, seq=270, size=69 bytes, dist=0, time=0.145ms
    prox2.mydomain.com    :   unicast, seq=268, size=69 bytes, dist=0, time=0.151ms
    prox1.mydomain.com :   unicast, seq=271, size=69 bytes, dist=0, time=0.145ms

Finally, I found the solution [2]. Enabling promiscuous mode on my bridge interface (in every node) solves the problem:

Code:
ip link set vmbr0 promisc on

Now I can see I don't lose any multicast packets and my cluster's quorum is stable. So, apparently, it wasn't the switch but the proxmox itself...

Now, the question is: why? is there any better solution?

Thanks!!



1: https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues
2: https://forum.proxmox.com/threads/pvecm-status-activity-blocked.26910/#post-135333
 
Last edited:
In the past (pve3) we had similar. I don't really know why... we changed on the advice of the PVE support to a seperatly vlan only for the clustercommunication, and yes we have done not only vlan, also a physical extra nic bonding for the clustercommunication. After that we never hat such problems. Maybe it is a little bit sensitive.

But this was on version 3.x. On version 4.x we haven't tested without extra vlan. Every cluster has his one phy and vlan for the communication.
 
Thanks for your reply,

My cluster is on PVE3.4 and the nodes have the following network configuration:
  • A bonding (bond0) formed by two Gigabit interfaces (eth0 + eth1), using LACP (802.3ad)
  • In the switch I have an untagged VLAN (e.g. ID 6) that I use it not exclusively for cluster communication but also for some of the VM inside Proxmox.
  • I permit all tagged VLAN so I can configure any VM with the network configuration I need
I would like to migrate my cluster to PVE4 so maybe I will try using a separated VLAN / Network interfaces just for the cluster networking.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!