Lost node, now cannot re-add it

rugby

Member
Oct 24, 2009
138
0
16
I've followed the steps in the wiki and ensured my switch has multi-cast enabled but cannot add my replaced node back into my cluster. I get hung up at "Waiting for Quorum" Here's the output from new replacement node:

root@proxmox03:~# pvecm add X.X.X.X
The authenticity of host 'X.X.X.X (X.X.X.X)' can't be established.
RSA key fingerprint is 7d:35:d2:e1:f1:d4:24:51:92:97:7e:7e:b4:66:57:38.
Are you sure you want to continue connecting (yes/no)? yes
root@X.X.X.X's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

Here's the output on one of the cluster members:


root@proxmox01:~# pvecm nodes
Node Sts Inc Joined Name
1 M 516 2013-11-16 08:37:47 proxmox01
2 M 564 2013-12-23 12:35:59 proxmox02
3 X 0 proxmox03
4 M 564 2013-12-23 12:35:59 proxmox04

Ideas?
 
I've followed the steps in the wiki and ensured my switch has multi-cast enabled but cannot add my replaced node back into my cluster. I get hung up at "Waiting for Quorum" Here's the output from new replacement node:
...
Ideas?

try

#clustat

and see if "Member Status: Quorate" is there.

otherwise, try

#pvecm e 1

to temporarily put cluster quorate manually.

then you should be able to join new node and quorum should be reset automatically

Marco
 
try

#clustat

and see if "Member Status: Quorate" is there.

otherwise, try

#pvecm e 1

to temporarily put cluster quorate manually.

then you should be able to join new node and quorum should be reset automatically

Marco


root@proxmox03:~# clustat
Cluster Status for Clusterfish @ Mon Dec 30 10:53:22 2013
Member Status: Inquorate

ran pvecm e 1 and it changed to:


root@proxmox03:~# clustat
Cluster Status for Clusterfish @ Mon Dec 30 10:57:13 2013
Member Status: Quorate


Member Name ID Status
------ ---- ---- ------
proxmox01 1 Offline
proxmox02 2 Offline
proxmox03 3 Online, Local
proxmox04 4 Offline

but it doesn't join and when I try it again it says "authentication key already exists."
 
Ok, new development. I restarted Proxmox04 and now it sees Proxmox03 but not 02 or 01.

root@proxmox04:~# clustat
Cluster Status for Clusterfish @ Mon Dec 30 11:14:17 2013
Member Status: Inquorate


Member Name ID Status
------ ---- ---- ------
proxmox01 1 Offline
proxmox02 2 Offline
proxmox03 3 Online
proxmox04 4 Online, Local


root@proxmox04:~#

Looks like restarting the other 2 nodes should bring it all back together.

**I rebooted the other 2 nodes and all 4 show up now. I think somewhere along the line somebody here had updated some hosts and caused a problem.