Cluster hell

azop

Renowned Member
Feb 6, 2012
44
1
73
I'm struggling with a 4 node setup with Proxmox 2.3. My Dell switch does not support multicast, so I setup unicast.

*some* nodes can see each other, but most can't. I am totally out of ideas, short of formatting and reinstalling (which would suck, this server is colocated so I'd have to do it over IPMI)

The nodes are named rovio, rovio2, rovio3, rovio4
Code:
root@rovio:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M    324   2013-03-26 19:50:54  rovio
   2   M  94892   2013-03-29 01:55:37  rovio3
   3   X      0                        rovio2
   4   X      0                        rovio4

Code:
root@rovio:~# ssh root@rovio2 pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        rovio
   2   M  94892   2013-03-29 01:55:38  rovio3
   3   X      0                        rovio2
   4   X      0                        rovio4

Code:
root@rovio:~# ssh root@rovio3 pvecm nodesThe authenticity of host 'rovio3 (172.16.2.12)' can't be established.
RSA key fingerprint is 79:71:48:71:1c:1d:3d:8e:78:e5:0b:85:be:8c:2d:0c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'rovio3,172.16.2.12' (RSA) to the list of known hosts.
root@rovio3's password: 
Node  Sts   Inc   Joined               Name
   1   X      0                        rovio
   2   X      0                        rovio2
   3   M      8   2013-03-25 19:18:23  rovio3
   4   X      0                        rovio4

Code:
root@rovio:~# ssh root@rovio4 pvecm nodes
The authenticity of host 'rovio4 (172.16.2.13)' can't be established.
RSA key fingerprint is 5b:45:ac:a6:fa:91:53:b7:0d:6d:b9:fa:63:ec:57:be.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'rovio4' (RSA) to the list of known hosts.
root@rovio4's password: 
Node  Sts   Inc   Joined               Name
   1   X      0                        rovio
   2   X      0                        rovio2
   3   X      0                        rovio3
   4   M     24   2013-03-25 19:19:05  rovio4


pvecm status:

Code:
root@rovio:~# pvecm status
Version: 6.2.0
Config Version: 9
Cluster Name: bluecherry
Cluster Id: 40173
Cluster Member: Yes
Cluster Generation: 94892
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: rovio
Node ID: 1
Multicast addresses: 239.192.156.138 
Node addresses: 172.16.2.10

Code:
root@rovio:~# ssh root@rovio2 pvecm status
Version: 6.2.0
Config Version: 9
Cluster Name: bluecherry
Cluster Id: 40173
Cluster Member: Yes
Cluster Generation: 94892
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: rovio3
Node ID: 2
Multicast addresses: 239.192.156.138 
Node addresses: 172.16.2.11

Code:
root@rovio3's password: 
Version: 6.2.0
Config Version: 6
Cluster Name: bluecherry
Cluster Id: 40173
Cluster Member: Yes
Cluster Generation: 8
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: rovio3
Node ID: 3
Multicast addresses: 239.192.156.138 
Node addresses: 127.0.0.1

Code:
Version: 6.2.0
Config Version: 6
Cluster Name: bluecherry
Cluster Id: 40173
Cluster Member: Yes
Cluster Generation: 24
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: rovio4
Node ID: 4
Multicast addresses: 239.192.156.138 
Node addresses: 127.0.0.1


/etc/cluster/cluster.conf

Code:
root@rovio:~# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster name="bluecherry" config_version="9">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>


  <clusternodes>
  <clusternode name="rovio" votes="1" nodeid="1"/>
  <clusternode name="rovio3" votes="1" nodeid="2"/><clusternode name="rovio2" votes="1" nodeid="3"/><clusternode name="rovio4" votes="1" nodeid="4"/></clusternodes>


</cluster>
Code:
root@rovio:~# ssh root@rovio2 cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="bluecherry" config_version="9">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>


  <clusternodes>
  <clusternode name="rovio" votes="1" nodeid="1"/>
  <clusternode name="rovio3" votes="1" nodeid="2"/><clusternode name="rovio2" votes="1" nodeid="3"/><clusternode name="rovio4" votes="1" nodeid="4"/></clusternodes>


</cluster>

Code:
root@rovio:~# ssh root@rovio3 cat /etc/cluster/cluster.conf
root@rovio3's password: 
<?xml version="1.0"?>
<cluster name="bluecherry" config_version="5">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>


  <clusternodes>
  <clusternode name="rovio" votes="1" nodeid="1"/>
  <clusternode name="rovio3" votes="1" nodeid="2"/></clusternodes>


</cluster>






Code:
root@rovio:~# ssh root@rovio4 cat /etc/cluster/cluster.conf
root@rovio4's password: 
<?xml version="1.0"?>
<cluster name="bluecherry" config_version="9">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>


  <clusternodes>
  <clusternode name="rovio" votes="1" nodeid="1"/>
  <clusternode name="rovio3" votes="1" nodeid="2"/><clusternode name="rovio2" votes="1" nodeid="3"/><clusternode name="rovio4" votes="1" nodeid="4"/></clusternodes>


</cluster>
 
Hi,

I don't know if it's related but pvecm status show 127.0.0.1 as node addres for node3 & 4

Thanks, I changed that but it didn't fix the cluster problem.

Code:
proot@rovio4:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        rovio
   2   X      0                        rovio3
   3   X      0                        rovio2
   4   M      4   2013-03-31 09:24:12  rovio4
root@rovio4:~# pvecm status
Version: 6.2.0
Config Version: 9
Cluster Name: bluecherry
Cluster Id: 40173
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 4
Total votes: 1
Node votes: 1
Quorum: 3 Activity blocked
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: rovio4
Node ID: 4
Multicast addresses: 255.255.255.255 
Node addresses: 172.16.2.13

Code:
root@rovio:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M    324   2013-03-26 19:50:54  rovio
   2   M  94892   2013-03-29 01:55:37  rovio3
   3   X      0                        rovio2
   4   X      0                        rovio4
 
Last edited:
I ended up reinstalling the main cluster node hoping to fix this mess. I followed the instructions to activate unicast from the WIKi and then I tried to add the nodes. I ran into authenication already exists and unable to copy ssh id (with -force) and was finally able to remove the old ssh keys and get it to connect.

Now I have the same problem as before, I can't see the other nodes. On the node I just added cluster.conf contains, so both nodes should be using unicast.

Code:
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">


I now see this repeating on the cluster node in /etc/messages:

Code:
pr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:45 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:45 rovio1 corosync[1673]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr  7 09:28:45 rovio1 corosync[1673]:   [CPG   ] chosen downlist: sender r(0) ip(68.67.74.162) ; members(old:1 left:0)
Apr  7 09:28:45 rovio1 corosync[1673]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:49 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:49 rovio1 corosync[1673]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr  7 09:28:49 rovio1 corosync[1673]:   [CPG   ] chosen downlist: sender r(0) ip(68.67.74.162) ; members(old:1 left:0)
Apr  7 09:28:49 rovio1 corosync[1673]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:51 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:51 rovio1 corosync[1673]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr  7 09:28:51 rovio1 corosync[1673]:   [CPG   ] chosen downlist: sender r(0) ip(68.67.74.162) ; members(old:1 left:0)
Apr  7 09:28:51 rovio1 corosync[1673]:   [MAIN  ] Completed service synchronization, ready to provide service.


Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:55 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:55 rovio1 corosync[1673]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr  7 09:28:55 rovio1 corosync[1673]:   [CPG   ] chosen downlist: sender r(0) ip(68.67.74.162) ; members(old:1 left:0)
Apr  7 09:28:55 rovio1 corosync[1673]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] CLM CONFIGURATION CHANGE
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] New Configuration:
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] #011r(0) ip(68.67.74.162) 
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] Members Left:
Apr  7 09:28:59 rovio1 corosync[1673]:   [CLM   ] Members Joined:
Apr  7 09:28:59 rovio1 corosync[1673]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr  7 09:28:59 rovio1 corosync[1673]:   [CPG   ] chosen downlist: sender r(0) ip(68.67.74.162) ; members(old:1 left:0)
Apr  7 09:28:59 rovio1 corosync[1673]:   [MAIN  ] Completed service synchronization, ready to provide service.
 
AFAIK our support team fixed your issue (reason was a not suitable custom network setup for the cluster communication network).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!