Node can't join Quorum but multicast ping works fine

slappyjam

New Member
Feb 27, 2013
13
0
1
Hello,

I have a 4 node proxmox cluster. Currently I've tried to add a fourth node to no avail. I thought perhaps the issue was related to multicast, but I can do a multicast ping between all nodes in the cluster. I can mount the /etc/pve fileystem just fine. The fourth node also shows up in the UI but with a red light. I'm at a loss of where else to check or how to continue to troubleshoot this.

Below are some commands that I've run to try to troubleshoot the issue:

Code:
root@virt4-atl:/# asmping 239.192.37.213 virt2-atl
asmping joined (S,G) = (*,239.192.37.234)
pinging 10.10.155.11 from 10.10.155.13
  unicast from 10.10.155.11, seq=1 dist=0 time=1.518 ms
multicast from 10.10.155.11, seq=1 dist=0 time=1.543 ms
  unicast from 10.10.155.11, seq=2 dist=0 time=0.257 ms
multicast from 10.10.155.11, seq=2 dist=0 time=0.266 ms
  unicast from 10.10.155.11, seq=3 dist=0 time=0.237 ms
multicast from 10.10.155.11, seq=3 dist=0 time=0.250 ms


--- 10.10.155.11 statistics ---
3 packets transmitted, time 2828 ms
unicast:
   3 packets received, 0% packet loss
   rtt min/avg/max/std-dev = 0.237/0.670/1.518/0.599 ms
multicast:
   3 packets received, 0% packet loss since first mc packet (seq 1) recvd
   rtt min/avg/max/std-dev = 0.250/0.686/1.543/0.606 ms


When I restart 'cman' on the fourth node to join the cluster I get:

Code:
starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster


The /etc/pve filesystem mounts fine as well:

Code:
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,default_permissions,allow_other)


When I do 'pvecm nodes' on one of the working nodes I get:

Code:
root@virt2-atl:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   1148   2013-02-27 16:29:38  virt3-atl
   2   M   1148   2013-02-27 16:29:38  virt2-atl
   3   M   1148   2013-02-27 16:29:38  virt1-atl
   4   X      0                        virt4-atl


When I issue the same command on the fourth (not working) node I get:

Code:
root@virt4-atl:/#  pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        virt3-atl
   2   X      0                        virt2-atl
   3   X      0                        virt1-atl
   4   M     96   2013-02-27 16:37:00  virt4-atl

I'm at a loss on how to continue to troubleshoot this. I thought if all the nodes could communicate with multicast they should be able to form the quorum. If anyone can point me in the right direction, it would be greatly appreciated!
 
Rebooting does not help.

The one thing I have noticed is: nodes 1-3 are on a diff network than node 4 when you look at the 'pvecm status' command:
Code:
root@virt3-atl:~# pvecm status
Version: 6.2.0
Config Version: 28
Cluster Name: KVM-ATL
Cluster Id: 9648
Cluster Member: Yes
Cluster Generation: 1148
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 6
Flags: 
Ports Bound: 0 177  
Node name: virt3-atl
Node ID: 1
Multicast addresses: 239.192.37.213 
Node addresses: 10.10.68.54 



root@virt4-atl:~# pvecm status
Version: 6.2.0
Config Version: 28
Cluster Name: KVM-ATL
Cluster Id: 9648
Cluster Member: Yes
Cluster Generation: 108
Membership state: Cluster-Member
Nodes: 1
Expected votes: 4
Total votes: 1
Node votes: 1
Quorum: 3 Activity blocked
Active subsystems: 2
Flags: 
Ports Bound: 0  
Node name: virt4-atl
Node ID: 4
Multicast addresses: 239.192.37.213 
Node addresses: 10.10.155.13
Nodes 1-3 are on 10.10.68.x, Node 4 is on the 10.10.155.x network. That being said, all nodes have a NIC in the .155 network (155 is the proxmox mgmt network). I've added a static multicast route to send all multicast traffic on the 155 network.
 
I guess you should put all nodes on the same network! Host address is detected by resolving /etc/hostname via /etc/hosts.
 
I didn't think you could change the ip of a node once it joined the cluster. I was considering deleting the nodes and re-adding them so they are all on the same network. Does that sound feasible? Or should I just rebuild the cluster (not preferred)?
 
The docs says clearly you cannot change the hostname/IP - see "Changing the hostname and IP is not possible after cluster creation." - Therefore I told you to re-install as you are not the expert here.
http://pve.proxmox.com/wiki/Proxmox_...mox_VE_Cluster

I tried deleting one of my nodes (without changing the IP) and now I can't even add the node back. I've had to make a new post regarding my problems getting cman to start up crashses with the error message below:

Code:
Starting cman... /usr/sbin/ccs_config_validate: line 186: 101930 Segmentation fault      (core dumped) ccs_config_dump > $tempfile

I would love to avoid re-installing as it requires a trip to the data center.
 
We have the same issue. And putting the hosts on the same network does not solve the question and is not an option in our case. We have done the same test as slappyjam including testing multicasting manually, and still no quorum break through. Anyone found out how to "by-pass" this problem?
 
Last edited:
We have the same issue. And putting the hosts on the same network does not solve the question and is not an option in our case. We have done the same test as slappyjam including testing multicasting manually, and still no quorum break through. Anyone found out how to "by-pass" this problem?

We never found a resolution and have moved away from Proxmox as a VM solution for our production data center.
 
We never found a resolution and have moved away from Proxmox as a VM solution for our production data center.

Just to note, configuration issue with clustering can also be solved by getting help from our commercial support team - so far we we have here a 100 % success rate.
 
For those who will search, set pvecm expected to 1 and restart and it will join the cluster. then reboot it and should be fine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!