[SOLVED] Cluster Multicast Issues: Single Node Dead

crainbramp

Member
Apr 11, 2019
7
1
23
56
East Coast, USA
gravityservers.com
Am having some multicast issues on my Juniper stack network (this isn't said to say the problem is Juniper related, just informational). The goal is a 10 node cluster; servers are in place, I've been installing Proxmox 5.4.1 through IPMI, doing a dist-upgrade, and then running a script I have that installs openv-switch, configures the network, installs my own utilities and packages, our zabbix agent, and so on. A final reboot and the node is joined in. Everything worked perfect until node4.

Nodes1-3 get perfect multicast behavior flooding the network for 10 minutes at a run with 0% loss.
Node4 however, fails at 100% loss. This has left me ... wait for it ... at a loss.

Note that yes, node4 joined the cluster, but only after I forced the issue with a pvecm e 1 which means it wouldn't have succeeded without that hack. It would have, as the other 3 times I tried to join it, would have been stuck as "getting quorum ..." So yes, I realize the irony of the fact that joining it this way is totally useless.

Am not a multicast expert by any shot, but I've dug into this pretty far, checking forum posts, and am now out of ideas. Have reinstalled node4 (and properly removed it from the cluster each time and verified each corosync.conf file was clean) three times. Same behavior. Even got doubtful and counted the number of filled ports on the SRX240 vs. the number of servers I had. Nope, all accounted for. Not one plugged into the switch getting to the gateway somehow.

Checking the GUI shows nodes1-3 can see each other and report they are online. Node4 is a red X on nodes1-3. Logging into node 4 shows it with a green checkmark, but nodes1-3 with a red X.

Checking the /etc/pve/nodes file on nodes 1-3, you see:
Code:
drwxr-xr-x 2 root www-data 0 Apr 26 20:50 node04r1
drwxr-xr-x 2 root www-data 0 Apr 25 06:12 node03r1
drwxr-xr-x 2 root www-data 0 Apr 24 19:01 node02r1
drwxr-xr-x 2 root www-data 0 Apr 24 16:27 node01r1

Checking the /etc/pve/nodes file on node4 shows:
Code:
drwxr-xr-x 2 root www-data 0 Apr 27 14:51 node04r1

From nodes 1-3 (Quick test shown, I have done 10+ minute tests with 0% loss)
Code:
xx.xx.xx.51 :   unicast, xmt/rcv/%loss = 19/19/0%, min/avg/max/std-dev = 0.144/0.210/0.261/0.032
xx.xx.xx.51 : multicast, xmt/rcv/%loss = 19/19/0%, min/avg/max/std-dev = 0.185/0.250/0.492/0.066
xx.xx.xx.54 :   unicast, xmt/rcv/%loss = 12/12/0%, min/avg/max/std-dev = 0.169/0.220/0.283/0.038
xx.xx.xx.54 : multicast, xmt/rcv/%loss = 12/12/0%, min/avg/max/std-dev = 0.177/0.234/0.288/0.033
xx.xx.xx.57 :   unicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.087/0.158/0.219/0.066
xx.xx.xx.57 : multicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.095/0.166/0.227/0.062

xx.xx.xx.51 :   unicast, xmt/rcv/%loss = 19/19/0%, min/avg/max/std-dev = 0.144/0.210/0.261/0.032
xx.xx.xx.51 : multicast, xmt/rcv/%loss = 19/19/0%, min/avg/max/std-dev = 0.185/0.250/0.492/0.066
xx.xx.xx.54 :   unicast, xmt/rcv/%loss = 12/12/0%, min/avg/max/std-dev = 0.169/0.220/0.283/0.038
xx.xx.xx.54 : multicast, xmt/rcv/%loss = 12/12/0%, min/avg/max/std-dev = 0.177/0.234/0.288/0.033
xx.xx.xx.57 :   unicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.087/0.158/0.219/0.066
xx.xx.xx.57 : multicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.095/0.166/0.227/0.062

xx.xx.xx.51 :   unicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.168/0.201/0.253/0.034
xx.xx.xx.51 : multicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.192/0.230/0.255/0.025
xx.xx.xx.52 :   unicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.146/0.174/0.221/0.030
xx.xx.xx.52 : multicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.156/0.189/0.259/0.040
xx.xx.xx.54 :   unicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.184/0.232/0.253/0.027
xx.xx.xx.54 : multicast, xmt/rcv/%loss = 6/6/0%, min/avg/max/std-dev = 0.236/0.249/0.262/0.011

From Node4
Code:
xx.xx.xx.52 :   unicast, xmt/rcv/%loss = 9/9/0%, min/avg/max/std-dev = 0.155/0.199/0.218/0.022
xx.xx.xx.52 : multicast, xmt/rcv/%loss = 9/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
xx.xx.xx.54 :   unicast, xmt/rcv/%loss = 11/11/0%, min/avg/max/std-dev = 0.145/0.207/0.251/0.030
xx.xx.xx.54 : multicast, xmt/rcv/%loss = 11/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
xx.xx.xx.57 :   unicast, xmt/rcv/%loss = 14/14/0%, min/avg/max/std-dev = 0.174/0.216/0.310/0.044
xx.xx.xx..57 : multicast, xmt/rcv/%loss = 14/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

From the SRX240 config
Code:
protocols {
    igmp {
        interface vlan.0 {
            version 2;
        }
        interface vlan.2 {
            version 2;
        }
    }
    stp;
    igmp-snooping {
        vlan all;
    }
}

Informational
All items double-verified by me and an independent review by one of our network guys
* Network is end-to-end Juniper. EX4200 switch as a TOR driving several SRX240s
* The nodes are behind an SRX240 by themselves in a single zone, with a single VLAN comprised of every interface on the firewall
* igmp-snooping is verified enabled, as is IGMP v2 on the VLAN at the SRX240 level
* All nodes are on the same /27 subnet
* All nodes are Proxmox 5.4.1 and have a post-install script ran so it's as identical as you can get
* Thinking node4 was just a crappy install, I've done a reinstall 3 times with the same results each time
* All /etc/host files are hard-coded, exact, verified copies of each other
* All nodes can ping each other at high rates
* Multicast can run 10+ minutes with 0% loss for nodes1-3
* Node4 is 100% loss
* IPTables is empty and disabled on all nodes (since they are behind a hardware firewall)
* No VLAN tagging or anything like that at the moment
 
Hello:

whereis your igmp querier ? on juniper too ?
Yep, I posted the config from the juniper in the list above. The queier is sitting on the SRX240, per the instructions which say to place it on the router, not the switch.

if yes, what is the interval time between igmp queries sent ?
Whatever the default is. Again, the config is above.
 
Last edited:
Hello:

whereis your igmp querier ? on juniper too ?
Yep, I posted the config from the juniper in the list above. The queier is sitting on the SRX240, per the instructions which say to place it on the router, not the switch.

if yes, what is the interval time between igmp queries sent ?
Whatever the default is. Again, the config is above.

Sorry, bu I don't see any configuration for igmp querier (only igmp snooping).
Maybe the querier is enabled by default ?

looking at juniper documentation, it seem that default interval is 2min
https://www.juniper.net/documentati...iguration/mcast-igmp-host-query-interval.html.

That's mean than worst case, you need to wait 2min before mulicast is working, and I'm not sure that proxmox wait for so long ('"getting quorum ..."). Try to set 30s for querier interval.
 
For anyone perhaps seeing odd behavior like this, the problem was that upstream, the EX4200 switch still had it's querier running. The commit, although it gave me a success, had an error, and did NOT commit.

This is the value of double-checking and double-checking again.

Amazing that this simple issue caused all the problems.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!