PM 3.1 Clustering - Reinstalling after master node failed

mylesw · Jul 28, 2014

I have a PM 3.1 cluster that has about 5 node members. On the weekend, the master server for the cluster died and we had to wipe & re-install PM on it. Its back online now, but its no longer part of the cluster. I did a cluster create on it, and that's done. But none of the child nodes are connected to it.

What do I need to do in order to re-attach a child node to a re-installed master node in a cluster? The cluster name is the same, but of course it has regenerated a new SSL key on re-installation.

Thanks in advance for any assistance.

Myles

dietmar · Jul 28, 2014

mylesw said:
I have a PM 3.1 cluster that has about 5 node members. On the weekend, the master server for the cluster died and we had to wipe & re-install PM on it. Its back online now, but its no longer part of the cluster. I did a cluster create on it, and that's done. But none of the child nodes are connected to it.

Never do a cluster create when you want to join an existing cluster!

mylesw said:
What do I need to do in order to re-attach a child node to a re-installed master node in a cluster?

I have no clue what you talk about. There is simply no 'master' and no 'slave' in a pve cluster.
To add a node, use 'pvecm add'

mylesw · Jul 28, 2014

dietmar said:
Never do a cluster create when you want to join an existing cluster!I have no clue what you talk about. There is simply no 'master' and no 'slave' in a pve cluster.To add a node, use 'pvecm add'

Ah, you might have just solved my problem. I was under the impression that a cluster had a master node and all children join to it. Since we normally create the cluster on a master first, and then do an add node on the children, it suggested to me that it had some form of hierachy to it. Are you saying that this is just semantic - that I should just add the original master node back to the cluster again?

dietmar · Jul 28, 2014

mylesw said:
Are you saying that this is just semantic - that I should just add the original master node back to the cluster again?

Yes, just add the node again. Maybe you need to use the --force flag if you use the same name/ip as before.

mylesw · Jul 28, 2014

dietmar said:
Yes, just add the node again. Maybe you need to use the --force flag if you use the same name/ip as before.

Awesome. Thank you very much.

Myles

mylesw · Jul 28, 2014

Something interesting... I wiped the server that I am trying to add to the cluster and re-installed PM 3.1 on it. All fine. Servers back up and running now.

But there is no sign of clustering on this at all. Attempting to addnode to the cluster is failing. When I look in /etc/cluster there is no cluster.conf there at all.

Am I missing a step here? Shouldn't clustering be installed by default and just enabled when you add a node to the cluster?

Myles

m.ardito · Jul 28, 2014

mylesw said:
Shouldn't clustering be installed by default and just enabled when you add a node to the cluster?

a fresh node has no cluster on it:
- if you join to an existing cluster, it makes no sense to have one different locally
- if you need to start a new cluster with it, you can simply create one

Marco

mylesw · Jul 28, 2014

m.ardito said:
a fresh node has no cluster on it:
- if you join to an existing cluster, it makes no sense to have one different locally
- if you need to start a new cluster with it, you can simply create one
Marco

Yes, but in this case the server was originally part of a cluster. The disk array died, but I was able to migrate all VMs off it before I had to wipe and re-install. Now I have wiped and re-installed, I want to re-join it to the same cluster it was on before.

When I attempt to do this with pvecm add <nodename> --force it fails with:

I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

I check, and I find that cman is installed and there but there is no definition of the cluster at all.

I think I'm missing a step here - something that sets up the cluster name that it should join to, and hence the cluster.conf files?

Myles

m.ardito · Jul 28, 2014

mylesw said:
When I attempt to do this with pvecm add <nodename> --force it fails with:

<nodename> is the node name (fqdn or IP, reachable) of a current cluster member node, right?

Marco

dietmar · Jul 28, 2014

m.ardito said:
<nodename> is the node name (fqdn or IP, reachable) of a current cluster member node, right?

Marco

see https://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster

# pvecm add <IP-ADDRESS-OF-CLUSTER-MEMBER>

m.ardito · Jul 28, 2014

dietmar said:
see https://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster

@mylesw
yes, I was suggesting to double check that you're using the right "target" (I always used IP but thought fqdn would work too, sorry). check that you can reach that IP and no firewall is blocking ports used by the cluster join (I dont' know which are used).

Marco

mylesw · Jul 28, 2014

Thanks to everyone. I got it working. The problem was using a hostname rather than IP address of a member node.

Myles

mylesw · Jul 28, 2014

I thought I was through the woods on this, but not quite yet. So I have successfully re-installed PM 3.1 on my server, and added it back to the cluster. It has the same hostname & IP as before, so it is now showing up as part of the cluster.

I am now trying to migrate VMs from the temporary server back to this one, and I'm getting this on attempting a migration:

Jul 28 13:08:29 # /usr/bin/ssh -o 'BatchMode=yes' root@xxxx /bin/true
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Jul 28 13:08:29 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
Jul 28 13:08:29 It is also possible that a host key has just been changed.
Jul 28 13:08:29 The fingerprint for the ECDSA key sent by the remote host is
Jul 28 13:08:29 xxxxxxxxxxxxx
Jul 28 13:08:29 Please contact your system administrator.
Jul 28 13:08:29 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Jul 28 13:08:29 Offending RSA key in /etc/ssh/ssh_known_hosts:12
Jul 28 13:08:29 ECDSA host key for xx.xx.xx.xx has changed and you have requested strict checking.
Jul 28 13:08:29 Host key verification failed.
Jul 28 13:08:29 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

OK, what should I do now?

Myles

timm4 · Jul 28, 2014

mylesw said:
I thought I was through the woods on this, but not quite yet. So I have successfully re-installed PM 3.1 on my server, and added it back to the cluster. It has the same hostname & IP as before, so it is now showing up as part of the cluster.

I am now trying to migrate VMs from the temporary server back to this one, and I'm getting this on attempting a migration:

Jul 28 13:08:29 # /usr/bin/ssh -o 'BatchMode=yes' root@xxxx /bin/true
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Jul 28 13:08:29 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
Jul 28 13:08:29 It is also possible that a host key has just been changed.
Jul 28 13:08:29 The fingerprint for the ECDSA key sent by the remote host is
Jul 28 13:08:29 xxxxxxxxxxxxx
Jul 28 13:08:29 Please contact your system administrator.
Jul 28 13:08:29 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Jul 28 13:08:29 Offending RSA key in /etc/ssh/ssh_known_hosts:12
Jul 28 13:08:29 ECDSA host key for xx.xx.xx.xx has changed and you have requested strict checking.
Jul 28 13:08:29 Host key verification failed.
Jul 28 13:08:29 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

OK, what should I do now?

Myles

SSH from one node to the other.

timm4 · Jul 28, 2014

mylesw said:
I thought I was through the woods on this, but not quite yet. So I have successfully re-installed PM 3.1 on my server, and added it back to the cluster. It has the same hostname & IP as before, so it is now showing up as part of the cluster.

I am now trying to migrate VMs from the temporary server back to this one, and I'm getting this on attempting a migration:

Jul 28 13:08:29 # /usr/bin/ssh -o 'BatchMode=yes' root@xxxx /bin/true
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
Jul 28 13:08:29 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Jul 28 13:08:29 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Jul 28 13:08:29 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
Jul 28 13:08:29 It is also possible that a host key has just been changed.
Jul 28 13:08:29 The fingerprint for the ECDSA key sent by the remote host is
Jul 28 13:08:29 xxxxxxxxxxxxx
Jul 28 13:08:29 Please contact your system administrator.
Jul 28 13:08:29 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Jul 28 13:08:29 Offending RSA key in /etc/ssh/ssh_known_hosts:12
Jul 28 13:08:29 ECDSA host key for xx.xx.xx.xx has changed and you have requested strict checking.
Jul 28 13:08:29 Host key verification failed.
Jul 28 13:08:29 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

OK, what should I do now?

Myles

SSH from one node to the new node.

mylesw · Jul 29, 2014

Nevermind. Fixed it. Was an entry in /root/.ssh/known_hosts on the older node that was causing the problem. Removed it, and then manually ssh'd to the new node, creating a new entry. Now it works great.

Thanks
Myles

Search

Search

PM 3.1 Clustering - Reinstalling after master node failed

mylesw

Renowned Member

dietmar

Proxmox Staff Member

mylesw

Renowned Member

dietmar

Proxmox Staff Member

mylesw

Renowned Member

mylesw

Renowned Member

m.ardito

Famous Member

mylesw

Renowned Member

m.ardito

Famous Member

dietmar

Proxmox Staff Member

m.ardito

Famous Member

mylesw

Renowned Member

mylesw

Renowned Member

timm4

Renowned Member

timm4

Renowned Member

mylesw

Renowned Member

We value your privacy