Can't Cluster...

vzfanatic

Active Member
Jul 22, 2008
67
0
26
Hi All,

I've read and followed what I believe is all the posts I can find on the topic of Proxmox VE clustering, and I am unable to get a master and slaves to sync.

I've upgraded to v1 with no problems.

When I try to add the slave to the master, from either the master or the slave, I get the following error:

'unable to add node: command failed - ssh xx.xx.xx.xx /usr/bin/pveca -a 'IP:xx.xx.xx.xx' 'NAME:vz1' 'HOSTRSAPUBKEY:...'

Originally, the hostnames were the same, I changed them and now this..

I'm sorry for creating more content on this topic, but really am at my wits end.

Thanks all for your help and a great product!
 
We use the hostnames in the authentication process. So it will not work if you change them. But you can modify /etc/pve/cluster.cfg manually to reflect your changes.

- Dietmar
 
Please ignore the last post - changing hostname before creating the cluster should work. (but you cannot change hostnames after you created the cluster).
 
I deleted all the conf files, recreated the master and I got this error:

syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]


Now on the master I see:

ERROR: 500 Can't connect to localhost:50000 (connect: Connection refused)
 
I deleted all the conf files, recreated the master and I got this error:

syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]


Now on the master I see:

ERROR: 500 Can't connect to localhost:50000 (connect: Connection refused)

you need to remove the following files on both nodes:
Code:
/etc/pve/cluster.cfg
/root/.ssh/known_hosts

and then create the cluster again.
 
I deleted as instructed and;

from slave:
pveca -a -h xx.xx.xx.xx

result:
syncing master configuration from 'xx.xx.xx.xx' failed (rsync --rsh=ssh -l root
-o BatchMode=yes -lpgoq xx.xx.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/maste
r/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq xx.x
x.xx.xx:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with
exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(453) [receiver=2.6.9]

from master
pveca -c
pveca -a -h xx.xx.xx.xx
result:
local node already part of cluster
 
I did.

You can see the first part of this message (from 'slave' = node)...

I've tried from the node.
vz3:/etc/pve# pveca -a -h xx.xx.xx.xx
root@xx.xx.xx.xx's password:
node already exists (CID:2, IP:xx.xx.xx.xx)
unable to add node: command failed - ssh xx.xx.xx.xx/usr/bin/pveca -a 'IP:xx.xx
.xx.xx' 'NAME:vz3' 'HOSTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAQEAvMYpEgVBvHJw+txAJE
hMeQOcEkV8bF6v1Y07IPcTl/o3Ze85TmXB9m4ohlm8TfplLuSZTr4fbSShgBP1NeK6y2BhjFyMJ/6aX5
AZZUP17O7X70vjcYvBsSwIPNRWvI5Mf0+/AdG7XQ3eP/lG30p/H68z5/tE1b9SJ0sceuGcJyjBLkybna
ZxKVwxaWR5vXTjb7CowyFGiTecQPn6nxQNTkeMGLPTVMJUhCMEoZeVwWZDpTh6+DHCbKbLsEbQ8BsFAh
KpXlAXTwfNWC8VFEqLl3pbToD5hYudBl5+LmtFkBnFLYHzmFP09c/7Z9z6+0wx6CNk78MKt9VTHVMaQs
PckQ==' 'ROOTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAIEAzByUC9sS3YChtK/QYv+dUcrHiWoR8
H9FKFYT3CaFbYry+dRtB3KRkjWQAhuIlQxmf0/MZGOVs0LFOTklvxyYPYLpRDiyApkCqnZDvwM9po62I
YsWlROCxrTWdYdOLEYitdQqa5HEEyAp2Z8NuZKBpP8g1AikB64qAUBZUPvGKv0='
vz3:/etc/pve#


I've tried from the master.

I've pasted to you both results.
 
Last edited:
ok, but you still get that:

Permission denied (publickey,password).

error. So something is wrong with the ssh keys.

- Dietmar
 
btw, I still thing the bug is related to the ssh key bug on debian. So it will not occur if you do fresh V1.0 install.

Is it possible to get a login at your server - i can fix i manually then.

- Dietmar
 
I did that, I don't understand why that's being spoken of.

In my message I state the I ran it from the NODE as well as the MASTER.

I've done both.

Please, I'm begging you; tell me specifically what I have to do to fix this. I have no idea, I didn't write this, I don't know about the SSH, etc.
 
1.) Do a fresh install of both nodes.

2.) create the master

3.) send me a copy of the file /etc/pve/cluster.cfg

4.) on the node, join the cluster "pve -a -h 'xxx.xxx.xxx.xxx'

5.) send me the output

6.) send me the file /etc/network/interfaces (master and node)

- Dietmar
 
Ok, after analyzing your files there are two issues:

1.) You modified the sshd configuration manually. You added:

Code:
AuthorizedKeysFile      %h/.ssh/key.pub

We use the authorized_keys files, so this is the reason why it does not work.

fix: comment out that line, then reload config with '/etc/init.d/ssh reload'

2.) Both machine run VMs with the same ID. So you can't cluster them! VM IDs needs to be unique within a cluster.

- Dietmar
 
I did.
vz3:/etc/pve# pveca -a -h xx.xx.xx.xx
root@xx.xx.xx.xx's password:
node already exists (CID:2, IP:xx.xx.xx.xx)
unable to add node: command failed - ssh xx.xx.xx.xx/usr/bin/pveca -a 'IP:xx.xx
.xx.xx' 'NAME:vz3' 'HOSTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAQEAvMYpEgVBvHJw+txAJE
hMeQOcEkV8bF6v1Y07IPcTl/o3Ze85TmXB9m4ohlm8TfplLuSZTr4fbSShgBP1NeK6y2BhjFyMJ/6aX5
AZZUP17O7X70vjcYvBsSwIPNRWvI5Mf0+/AdG7XQ3eP/lG30p/H68z5/tE1b9SJ0sceuGcJyjBLkybna
ZxKVwxaWR5vXTjb7CowyFGiTecQPn6nxQNTkeMGLPTVMJUhCMEoZeVwWZDpTh6+DHCbKbLsEbQ8BsFAh
KpXlAXTwfNWC8VFEqLl3pbToD5hYudBl5+LmtFkBnFLYHzmFP09c/7Z9z6+0wx6CNk78MKt9VTHVMaQs
PckQ==' 'ROOTRSAPUBKEY:AAAAB3NzaC1yc2EAAAABIwAAAIEAzByUC9sS3YChtK/QYv+dUcrHiWoR8
H9FKFYT3CaFbYry+dRtB3KRkjWQAhuIlQxmf0/MZGOVs0LFOTklvxyYPYLpRDiyApkCqnZDvwM9po62I
YsWlROCxrTWdYdOLEYitdQqa5HEEyAp2Z8NuZKBpP8g1AikB64qAUBZUPvGKv0='
vz3:/etc/pve#

This error happens when you try to talk to a machine that has changed its key (maybe reinstalled Proxmox). You need to delete the offending entries from /root/.ssh/authorized_keys and /root/.ssh/known_hosts. Then try again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!