Force multicast over a specific interface?

jpreston

New Member
Oct 4, 2012
2
0
1
I am attempting to join two freshly built proxmox 2.1 hosts into a cluster, following the guide here. Establishing the first node in the cluster goes fine. I seem to be running into problems when attempting to add the second node. The pvecm add command results in:

Code:
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

I checked the status of the cluster on both nodes

vm-proxmox1
Code:
root@vm-proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 5
Cluster Name: proxmox-cl-1
Cluster Id: 1891
Cluster Member: Yes
Cluster Generation: 16
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: vm-proxmox1
Node ID: 1
Multicast addresses: 239.192.7.106 
Node addresses: 10.3.106.1

vm-proxmox2
Code:
root@vm-proxmox2:~# pvecm status
Version: 6.2.0
Config Version: 5
Cluster Name: proxmox-cl-1
Cluster Id: 1891
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: vm-proxmox2
Node ID: 2
Multicast addresses: 239.192.7.106 
Node addresses: 10.3.106.2

Hmm. Quorum activity blocked.

Other forum posts suggest this to be an issue with multicast. Both of these nodes are connected via a Cisco Nexus 5k cluster in which I successfully host other multicast based applications, so my problem most likely isn't there. I began reading the Multicast notes wiki entry, where it is suggested to test multicast with ssmping. After installing the package on both hosts, I launched ssmpingd on the first node. On the second node, I executed the client side multicast ping:

Code:
root@vm-proxmox2:~# asmping 239.192.7.106 10.3.106.1
Failed to join multicast group: No such device
errno=19

No such device? So, I then tried to force the bridge device:
Code:
root@vm-proxmox2:~# asmping -I vmbr2003 239.192.7.106 10.3.106.1
asmping joined (S,G) = (*,239.192.7.234)
pinging 10.3.106.1 from 10.3.106.2
  unicast from 10.3.106.1, seq=1 dist=0 time=0.910 ms
  unicast from 10.3.106.1, seq=2 dist=0 time=0.201 ms
  unicast from 10.3.106.1, seq=3 dist=0 time=0.220 ms
  unicast from 10.3.106.1, seq=4 dist=0 time=0.182 ms
^C
--- 10.3.106.1 statistics ---
4 packets transmitted, time 3756 ms
unicast:
   4 packets received, 0% packet loss
   rtt min/avg/max/std-dev = 0.182/0.378/0.910/0.307 ms
multicast:
   0 packets received, 100% packet loss

Wow, it works! Knowing that multicast communications work between these two nodes, could my problems with corosync have anything to do with it also not selecting the correct network interface? Is there anyway that I can force this? I would appreciate any pointers here.

Thanks.
 
--- 10.3.106.1 statistics ---
4 packets transmitted, time 3756 ms
unicast:
4 packets received, 0% packet loss
rtt min/avg/max/std-dev = 0.182/0.378/0.910/0.307 ms
multicast:
0 packets received, 100% packet loss
[/CODE]

Wow, it works!

You have 100% packet loss (it does not work)!
 
Ah, yes. I completely overlooked that the successful packets were unicast. With that said, I moved on and rebuilt the two nodes, this time attempting to force the udpu transport mode found mentioned in the Multicast notes wiki entry. With this configuration I also experience the Waiting for quorum.... failure.

Code:
root@vm-proxmox2:~# pvecm add 10.3.106.1
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
40:8e:4e:29:66:44:08:a6:c7:9f:74:a0:d9:41:c5:c2 root@vm-proxmox2
The key's randomart image is:
+--[ RSA 2048]----+
|o+oo+oo          |
|+o +EB.          |
|. O *.+          |
| + * o .         |
|    +   S        |
|                 |
|                 |
|                 |
|                 |
+-----------------+
The authenticity of host '10.3.106.1 (10.3.106.1)' can't be established.
RSA key fingerprint is 22:d6:40:9c:15:01:63:5d:c5:03:e6:98:8a:8b:86:a9.
Are you sure you want to continue connecting (yes/no)? yes
root@10.3.106.1's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

I am beginning to question my network configuration on these nodes, and wonder if I have somehow configured it to cause these issues. For reference, here is my /etc/network/interfaces file:

Code:
# network interface settings
auto bond0.352
iface bond0.352 inet manual
        vlan-raw-device bond0

auto bond0.1033
iface bond0.1033 inet manual
        vlan-raw-device bond0

auto bond0.1034
iface bond0.1034 inet manual
        vlan-raw-device bond0

auto bond0.2001
iface bond0.2001 inet manual
        vlan-raw-device bond0

auto bond0.2002
iface bond0.2002 inet manual
        vlan-raw-device bond0

auto bond0.2003
iface bond0.2003 inet manual
        vlan-raw-device bond0

auto bond0.2004
iface bond0.2004 inet manual
        vlan-raw-device bond0

auto bond0.2005
iface bond0.2005 inet manual
        vlan-raw-device bond0

auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond_miimon 100
        bond_mode 802.3ad

iface vmbr0 inet manual
        bridge_ports none
        bridge_stp off
        bridge_fd 0

auto vmbr352
iface vmbr352 inet manual
        bridge_ports bond0.352
        bridge_stp off
        bridge_fd 0

auto vmbr1033
iface vmbr1033 inet static
        address 192.168.2.7
        netmask 255.255.255.0
        bridge_ports bond0.1033
        bridge_stp off
        bridge_fd 0

auto vmbr1034
iface vmbr1034 inet manual
        bridge_ports bond0.1034
        bridge_stp off
        bridge_fd 0


auto vmbr2001
iface vmbr2001 inet manual
        bridge_ports bond0.2001
        bridge_stp off
        bridge_fd 0

auto vmbr2002
iface vmbr2002 inet manual
        bridge_ports bond0.2002
        bridge_stp off
        bridge_fd 0


auto vmbr2003
iface vmbr2003 inet static
        address  10.3.106.2
        netmask  255.255.0.0
        bridge_ports bond0.2003
        bridge_stp off
        bridge_fd 0

auto vmbr2004
iface vmbr2004 inet manual
        bridge_ports bond0.2004
        bridge_stp off
        bridge_fd 0

auto vmbr2005
iface vmbr2005 inet manual
        bridge_ports bond0.2005
        bridge_stp off
        bridge_fd 0

The only difference in this file between my nodes are the IP addresses assigned to bridge interfaces vmbr2003 and vmbr1033. Does anyone have a suggestion as to what I should be looking at to get this working? Thanks.

J


You have 100% packet loss (it does not work)!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!