Can't add second node to cluster - "unable to copy ssh ID"

starkruzr

Well-Known Member
This keeps coming up every time I run pvecm add <first node> on the second node. It's not clear why. /var/log/cluster doesn't have anything about the problem, on either machine, and I can't find any other failure information. After searching the forum I found a suggestion that switching to unicast communication might solve it, but that didn't help. Here's my pveversion info (same on both machines):

Code:
root@carina /v/l/cluster# pveversion -vproxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.2-30 (running version: 3.2-30/1d095287)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-14
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-22
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
Last edited:
Per another thread I can't find right now, I tried just doing ssh-copy-id on the command line with verbose mode enabled.

Code:
root@cirrus:~# pvecm add carinaunable to copy ssh ID
root@cirrus:~# ssh-copy-id carina
cat: write error: Permission denied
root@cirrus:~# ssh-copy-id -v carina
OpenSSH_6.0p1 Debian-4+deb7u2, OpenSSL 1.0.1e 11 Feb 2013
Pseudo-terminal will not be allocated because stdin is not a terminal.
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
ssh: Could not resolve hostname umask 077; test -d ~/.ssh || mkdir ~/.ssh ; cat >> ~/.ssh/authorized_keys && (test -x /sbin/restorec: Name or service not known

It almost looks like it's being parsed incorrectly? This appears to happen when you are running ssh on a nonstandard port. Both hosts are running ssh on port 22. No VMs are running ssh on port 22.

I also feel compelled to point out again that I've already made sure the two hosts can communicate without passwords on port 22 with ssh. Is there some way to get pvecm to skip that step?
 
Can you help me understand how this works? How could I not have quorum on a cluster consisting of one node?

a cluster with one node always have quorum.

Code:
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
 
So, there was, but that was easily rectified. After doing that, the second node added correctly. There is an extra note here that I think needs specifying, though -- when I installed these machines, I installed "fish" as the default shell for both root users on both hosts. This is apparently Very Bad™. It seems there are various scripts (including pvecm and the ssh-copy-id script) that depend on bash not only being available but being the default shell. I changed this last night -- uninstalled fish and set bash as the default shell for both hosts, but when I tried again ssh-copy-id failed again. I think that was because in the interim I'd done "pvecm addnode cirrus" on carina, making it look for a second node that hadn't been added yet and causing quorum to fail.

Lessons learned (for those who Google this problem in the future):

1) Do not change the default shell for root!
2) Do not attempt to force the cluster master you created to acknowledge the second node by doing pvecm addnode. It will not help you. :)
 
I had the same issue recently when i reinstalled a slave node.
It was then not possible to ssh-copy-id the key from the new node to the cluster master.
When executing on the new node:
~# pvecm add pve-master
root@pve-master's password:
unable to copy ssh ID: cat: write error: Permission denied

~# ssh-copy-id root@pve-master
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@pve-master's password:
cat: write error: Permission denied
wich is the error pvecm add complained internaly.

I got it working with the following steps on the cluster master:
Code:
service corosync stop
vim /etc/corosync/corosync.conf # Delete the entry from the old node, wich is not displayed by pvecm nodes too...
service corosync start

Than add it from the new node as normal.
You have to stop corosync before this because otherwise corosync instantly patches back your changes.

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!