Can't Cluster...

vzfanatic

Active Member
Jul 22, 2008
67
0
26
Hi All,

I've read and followed what I believe is all the posts I can find on the topic of Proxmox VE clustering, and I am unable to get a master and slaves to sync.

I've upgraded to v1 with no problems.

When I try to add the slave to the master, from either the master or the slave, I get the following error:

'unable to add node: command failed - ssh xx.xx.xx.xx /usr/bin/pveca -a 'IP:xx.xx.xx.xx' 'NAME:vz1' 'HOSTRSAPUBKEY:...'

Originally, the hostnames were the same, I changed them and now this..

I'm sorry for creating more content on this topic, but really am at my wits end.

Thanks all for your help and a great product!
 
We use the hostnames in the authentication process. So it will not work if you change them. But you can modify /etc/pve/cluster.cfg manually to reflect your changes.

- Dietmar
 
Please ignore the last post - changing hostname before creating the cluster should work. (but you cannot change hostnames after you created the cluster).
 
I deleted all the conf files, recreated the master and I got this error:

syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]


Now on the master I see:

ERROR: 500 Can't connect to localhost:50000 (connect: Connection refused)
 
I deleted all the conf files, recreated the master and I got this error:

syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]


Now on the master I see:

ERROR: 500 Can't connect to localhost:50000 (connect: Connection refused)

you need to remove the following files on both nodes:
Code:
/etc/pve/cluster.cfg
/root/.ssh/known_hosts

and then create the cluster again.
 
I deleted as instructed and;

from slave:
pveca -a -h xx.xx.xx.xx

result:
syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]

from master
pveca -c
pveca -a -h xx.xx.xx.xx
result:
local node already part of cluster
 
I did.

You can see the first part of this message (from 'slave' = node)...

I've tried from the node.
vz3:/etc/pve# pveca -a -h xx.xx.xx.xx
root@xx.xx.xx.xx's password:
node already exists (CID:2, IP:xx.xx.xx.xx)
unable to add node: command failed - ssh xx.xx.xx.xx/usr/bin/pveca -a 'IP:xx.xx
.xx.xx' 'NAME:vz3' 'HOSTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAQEAvMYpEgVBvHJw+txAJE
hMeQOcEkV8bF6v1Y07IPcTl/o3Ze85TmXB9m4ohlm8TfplLuSZTr4fbSShgBP1NeK6y2BhjFyMJ/6aX5
AZZUP17O7X70vjcYvBsSwIPNRWvI5Mf0+/AdG7XQ3eP/lG30p/H68z5/tE1b9SJ0sceuGcJyjBLkybna
ZxKVwxaWR5vXTjb7CowyFGiTecQPn6nxQNTkeMGLPTVMJUhCMEoZeVwWZDpTh6+DHCbKbLsEbQ8BsFAh
KpXlAXTwfNWC8VFEqLl3pbToD5hYudBl5+LmtFkBnFLYHzmFP09c/7Z9z6+0wx6CNk78MKt9VTHVMaQs
PckQ==' 'ROOTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAIEAzByUC9sS3YChtK/QYv+dUcrHiWoR8
H9FKFYT3CaFbYry+dRtB3KRkjWQAhuIlQxmf0/MZGOVs0LFOTklvxyYPYLpRDiyApkCqnZDvwM9po62I
YsWlROCxrTWdYdOLEYitdQqa5HEEyAp2Z8NuZKBpP8g1AikB64qAUBZUPvGKv0='
vz3:/etc/pve#


I've tried from the master.

I've pasted to you both results.
 
Last edited:
ok, but you still get that:

Permission denied (publickey,password).

error. So something is wrong with the ssh keys.

- Dietmar
 
btw, I still thing the bug is related to the ssh key bug on debian. So it will not occur if you do fresh V1.0 install.

Is it possible to get a login at your server - i can fix i manually then.

- Dietmar
 
I did that, I don't understand why that's being spoken of.

In my message I state the I ran it from the NODE as well as the MASTER.

I've done both.

Please, I'm begging you; tell me specifically what I have to do to fix this. I have no idea, I didn't write this, I don't know about the SSH, etc.
 
1.) Do a fresh install of both nodes.

2.) create the master

3.) send me a copy of the file /etc/pve/cluster.cfg

4.) on the node, join the cluster "pve -a -h 'xxx.xxx.xxx.xxx'

5.) send me the output

6.) send me the file /etc/network/interfaces (master and node)

- Dietmar
 
Ok, after analyzing your files there are two issues:

1.) You modified the sshd configuration manually. You added:

Code:
AuthorizedKeysFile      %h/.ssh/key.pub

We use the authorized_keys files, so this is the reason why it does not work.

fix: comment out that line, then reload config with '/etc/init.d/ssh reload'

2.) Both machine run VMs with the same ID. So you can't cluster them! VM IDs needs to be unique within a cluster.

- Dietmar
 
I did.
vz3:/etc/pve# pveca -a -h xx.xx.xx.xx
root@xx.xx.xx.xx's password:
node already exists (CID:2, IP:xx.xx.xx.xx)
unable to add node: command failed - ssh xx.xx.xx.xx/usr/bin/pveca -a 'IP:xx.xx
.xx.xx' 'NAME:vz3' 'HOSTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAQEAvMYpEgVBvHJw+txAJE
hMeQOcEkV8bF6v1Y07IPcTl/o3Ze85TmXB9m4ohlm8TfplLuSZTr4fbSShgBP1NeK6y2BhjFyMJ/6aX5
AZZUP17O7X70vjcYvBsSwIPNRWvI5Mf0+/AdG7XQ3eP/lG30p/H68z5/tE1b9SJ0sceuGcJyjBLkybna
ZxKVwxaWR5vXTjb7CowyFGiTecQPn6nxQNTkeMGLPTVMJUhCMEoZeVwWZDpTh6+DHCbKbLsEbQ8BsFAh
KpXlAXTwfNWC8VFEqLl3pbToD5hYudBl5+LmtFkBnFLYHzmFP09c/7Z9z6+0wx6CNk78MKt9VTHVMaQs
PckQ==' 'ROOTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAIEAzByUC9sS3YChtK/QYv+dUcrHiWoR8
H9FKFYT3CaFbYry+dRtB3KRkjWQAhuIlQxmf0/MZGOVs0LFOTklvxyYPYLpRDiyApkCqnZDvwM9po62I
YsWlROCxrTWdYdOLEYitdQqa5HEEyAp2Z8NuZKBpP8g1AikB64qAUBZUPvGKv0='
vz3:/etc/pve#

This error happens when you try to talk to a machine that has changed its key (maybe reinstalled Proxmox). You need to delete the offending entries from /root/.ssh/authorized_keys and /root/.ssh/known_hosts. Then try again.