Can't add node to cluster.

robynhub · Feb 21, 2012

Can't add node to cluster. [Solved]

I'm beta testing Proxmox VE 2.0.
I've build up a two node cluster (with VMFO2 and VMFO3).
I've upgraded two nodes to the latest release and everything works fine.
I had a new fresh install to a third server (VMFO1) and when I try to add the server to the cluster I've got this:

Code:

 root@VMF01:~# pvecm add VMFO2
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
fd:f6:70:44:8b:5e:19:f4:c5:7b:07:8b:07:77:ff:69 root@VMF01
The key's randomart image is:
+--[ RSA 2048]----+
|              ...|
|            ..o.+|
|             +o++|
|         .  .oo=+|
|        S . ..= =|
|           o o E.|
|            = o  |
|           . +   |
|              .  |
+-----------------+
The authenticity of host 'vmfo2 (X.X.X.102)' can't be established.
RSA key fingerprint is b5:1e:e0:89:e3:18:e9:60:2b:04:cd:1f:cd:f3:af:de.
Are you sure you want to continue connecting (yes/no)? yes
root@vmfo2's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?

Before you ask, i've checked the mcast connectivity of the switch and everything is fine on that front. The strange thing is that the two working nodes of the cluster says that the cluster in another multicast group!

Code:

root@VMFO3:~# pvecm status
Version: 6.2.0
Config Version: 15
Cluster Name: ClusterFO
Cluster Id: 45483
Cluster Member: Yes
Cluster Generation: 2504
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: VMFO3
Node ID: 1
Multicast addresses: 239.192.177.93 
Node addresses: X.X.X.103 

root@VMFO3:~# pveversion -v
pve-manager: 2.0-30 (pve-manager/2.0/af79261b)
running kernel: 2.6.32-5-amd64
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-6-pve: 2.6.32-55
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-22
qemu-server: 2.0-18
pve-firmware: 1.0-15
libpve-common-perl: 1.0-14
libpve-access-control: 1.0-12
libpve-storage-perl: 2.0-11
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-3
ksm-control-daemon: 1.1-1

Code:

root@VMFO2:~# pvecm status
Version: 6.2.0
Config Version: 15
Cluster Name: ClusterFO
Cluster Id: 45483
Cluster Member: Yes
Cluster Generation: 2504
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: VMFO2
Node ID: 2
Multicast addresses: 239.192.177.93 
Node addresses: X.X.X.102 

root@VMFO2:~# pveversion -v
pve-manager: 2.0-30 (pve-manager/2.0/af79261b)
running kernel: 2.6.32-5-amd64
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-6-pve: 2.6.32-55
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-22
qemu-server: 2.0-18
pve-firmware: 1.0-15
libpve-common-perl: 1.0-14
libpve-access-control: 1.0-12
libpve-storage-perl: 2.0-11
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-3
ksm-control-daemon: 1.1-1

Code:

root@VMF01:~# pvecm status
Version: 6.2.0
Config Version: 15
Cluster Name: CLUSTERFO
Cluster Id: 37419
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: VMF01
Node ID: 3
Multicast addresses: 239.192.146.189 
Node addresses: X.X.X.91 

root@VMF01:~# pveversion -v
pve-manager: 2.0-30 (pve-manager/2.0/af79261b)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-22
qemu-server: 2.0-18
pve-firmware: 1.0-15
libpve-common-perl: 1.0-14
libpve-access-control: 1.0-12
libpve-storage-perl: 2.0-11
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-3
ksm-control-daemon: 1.1-1

As you can see, also the cluster ID differ from the working nodes version...
All the three nodes are at the latest version. What I can do to fix this? I'm fighting with this issue by three days.

Thanks in advance

dietmar · Feb 22, 2012

Does it help when you start cman again on that node?

# /etc/init.d/cman start

And what is the output of

# pvecm nodes

on those nodes?

dietmar · Feb 22, 2012

Strange, but even the cluster name differs!!

Cluster Name: CLUSTERFO

and

Cluster Name: ClusterFO

Something is wrong there.

dietmar · Feb 22, 2012

Seems you are mixing nodes from to different cluster somehow?

dietmar · Feb 22, 2012

Seems you already created a new cluster on the third node. I assume you issued
a 'pcecm create' by accident.

I suggest that you do a clean reinstall of that node, and the add it to the cluster with 'pvecm add ...'

robynhub · Feb 22, 2012

I've tryed with no success. In other nodes cman start with no issues.
It's no a new pvecm create issue. The VMF01 it's a fresh new install (tryed 4 times

)
pvecm nodes output:

Code:

root@VMFO3:/etc/pve# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   1312   2011-12-30 17:58:47  VMFO3
   2   M   2468   2012-02-18 10:47:54  VMFO2
   3   X   2516                        VMF01

root@VMFO2:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   2468   2012-02-18 10:47:54  VMFO3
   2   M   1320   2012-01-02 11:07:08  VMFO2
   3   X   2516                        VMF01

root@VMF01:/etc/cluster# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        VMFO3
   2   X      0                        VMFO2
   3   M   2532   2012-02-22 10:18:55  VMF01

robynhub · Feb 22, 2012

Another thing. In the /etc/pve folder on the running nodes, cluster.conf refer to CLUSTERFO and not to ClusterFO.
It seems to be the problem. I've tryed to change it manually and after a new fresh install I will try to readd the node.
Obviusly I ran a pvecm delnode VMF01 on a running node.

STFU

robynhub · Feb 22, 2012

After this modify it's seems that it goes up but fail again:

Code:

root@VMF01:~# pvecm add VMFO3
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
e0:c7:7a:60:25:bd:a8:52:48:7b:c8:c6:c2:01:84:90 root@VMF01
The key's randomart image is:
+--[ RSA 2048]----+
|*o               |
|E      .         |
| ..   o o        |
|.+.+ . * .       |
|..B o = S        |
| o o o +         |
|  . . . .        |
|   .   .         |
|                 |
+-----------------+
The authenticity of host 'vmfo3 (X.X.X.103)' can't be established.
RSA key fingerprint is 99:7d:e5:e9:6b:b5:fc:20:37:86:75:1c:13:76:5f:ae.
Are you sure you want to continue connecting (yes/no)? yes
root@vmfo3's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
cluster not ready - no quorum?
root@VMF01:~# pvecm  status
Version: 6.2.0
Config Version: 21
Cluster Name: ClusterFO
Cluster Id: 45483
Cluster Member: Yes
Cluster Generation: 2524
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: VMF01
Node ID: 3
Multicast addresses: 239.192.177.93 
Node addresses: 109.73.81.91 
root@VMF01:~# pvecm n
Node  Sts   Inc   Joined               Name
   1   M   2524   2012-02-22 11:12:14  VMFO3
   2   M   2524   2012-02-22 11:12:14  VMFO2
   3   M      4   2012-02-22 11:12:14  VMF01

It's a very strange thing.

robynhub · Feb 22, 2012

I've done an pvecm updatecerts -force and all seems to go right.

I hope that I can sleep tonight... This thing drive me crazy...

dietmar · Feb 22, 2012

robynhub said:
I've done an pvecm updatecerts -force and all seems to go right.

I hope that I can sleep tonight... This thing drive me crazy...

Well, someone/something changed the cluster name from ClusterFO to CLUSTERFO. The multicast IP address is generated from that value, so that has many strange side effect.

robynhub · Feb 22, 2012

Ok, You can't be more exaustive. Thanks a lot for your support.

Search

Search

Can't add node to cluster.

robynhub

Renowned Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

robynhub

Renowned Member

robynhub

Renowned Member

robynhub

Renowned Member

robynhub

Renowned Member

dietmar

Proxmox Staff Member

robynhub

Renowned Member

We value your privacy