Multihomed cluster (CMAN/corosync multicast configuration)

NStorm

Active Member
Dec 23, 2011
64
2
28
Russia, Rostov-na-Donu
Hello.

I've been running 2-node cluster. It was configured to work by default on vmbr0 interface, which had whole LAN subnet 192.168.9.0/24 behind. Lately I've added a new faster NIC attached to vmbr2 and re-configured cluster to work through it, with the dedicated VLAN 192.168.248.0/30 subnet. I've changed /etc/hosts on both nodes to adapt a new address on .248/30 subnets. Everything worked fine here, cluster switched working on vmbr2/248.0 subnet.
Nodes are called node1 and node2. The "know" each other as 248.1 and 248.2 via /etc/hosts. While DNS records on 9.0 network resolves them to their vmbr0/9.0 addresses (so web interface works from LAN, etc).
Now I want to add 3rd node to the cluster (named node-stor1), which is behind vmbr0/9.0 subnet. It can reach node1 and node2 by their name/ssh on 9.0 subnet. But cluster join fails on "Waiting for quorum" CMAN.
I've read corosync, CMAN and cluster.conf documentation, as well as these URLs: https://pve.proxmox.com/wiki/Multicast_notes and most important: https://fedorahosted.org/cluster/wiki/MultiHome

So the problem is CMAN registers multicast group on vmbr2/248.0 only, despite clusternode config lists a new node behind vmbr0/9.0 subnet. I've tried to enable musticast pings and check if it works. I've used ping, netstat -g and smcroute to check if I can have all 3 nodes in the same multicast group in vmbr0/9.0 subnet - everything works fine, every 3 nodes reply to a multicast ping. So no switch blocks my multicast traffic.
Now the last URL I've posted above gives a good example on how to setup a multihomed multicast CMAN cluster. I've followed it and added an altnames node1a and node2a to /etc/pve/cluster.conf. I've also added these IPs to /etc/hosts of node to resolve node1a and node2a to their 192.168.9.0/24 addresses. After restarting proxmox-cluster and CMAN on these nodes, they register themselves with 2 multicast addresses:
Code:
# pvecm status
Version: 6.2.0
Config Version: 14
Cluster Name: cluster
Cluster Id: 821
Cluster Member: Yes
Cluster Generation: 3256
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: node1
Node ID: 2
Multicast addresses: 239.192.3.56 239.192.3.57 
Node addresses: 192.168.248.1 192.168.9.231

But the new node still can't "see" them. I guess that's because new node does not have an altname and registers only for 239.192.3.56 multicast address, while actually 2 other nodes sees this as second interface and registers with 239.192.3.57 multicast address.
Any suggestions to get that 3rd node to join cluster with the following multihomed scheme?

Code:
<NODE2> vmbr0/9 <---
vmbr2/248           \
      ^              \
      |                --> vmbr0/9 <NODE-STOR1>
      v              /
vmbr2/248           /
<NODE1> vmbr0/9 <---
 
Ok, after adding 3rd node an alias as node-stor1a and adding this to the altname of the clusterconf seems like the 3rd node are finally joined the cluster.
But now I'm getting a few similar messages in /var/log/cluster/corosync.log 2-3 per second:
Code:
May 08 15:38:22 corosync [TOTEM ] Automatically recovered ring 0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!