pvecm add <master> failing on install with ssh error

jstableau

New Member
Apr 10, 2015
8
0
1
Hi,

I'm attempting to install a 4 node cluster where the members mostly reside on different switches. Two of the servers (the first addition and the third) joined the cluster correctly. The second node fails to join with the following error:


root@pxmxv1001:/etc/pve# ssh pxmxv1002
Password:
Linux pxmxv1002 2.6.32-37-pve #1 SMP Wed Feb 11 10:00:27 CET 2015 x86_64


The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.


Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Apr 9 14:18:23 2015 from pxmxv1001.tableauprod.net
root@pxmxv1002:~# pvecm add pxmxv1001
unable to copy ssh ID
root@pxmxv1002:~# ssh-copy-id -vv pxmxv1001
OpenSSH_6.0p1 Debian-4+deb7u2, OpenSSL 1.0.1e 11 Feb 2013
Pseudo-terminal will not be allocated because stdin is not a terminal.
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: ssh_connect: needpriv 0
ssh: Could not resolve hostname umask 077; test -d ~/.ssh || mkdir ~/.ssh ; cat >> ~/.ssh/authorized_keys && (test -x /sbin/restorec: Name or service not known

It appears that ssh is misparsing the line in ssh-copy-id. This seems strange because all the nodes were installed with the same 3.4 version and I didn't get this error on the 03 or 04 nodes.

I'd read in another forum that this could be due to Multicast issues (though I don't understand why I would get this error). Oddly omping doesn't even respond for the 03 node, which appears to be a healthy member of the cluster:

root@pxmxv1001:/etc/pve# omping 239.192.89.254 pxmxv1001 pxmxv1002 pxmxv1003 pxmxv1004
239.192.89.254 : waiting for response msg
pxmxv1002 : waiting for response msg
pxmxv1003 : waiting for response msg
pxmxv1004 : waiting for response msg


^C
239.192.89.254 : response message never received
pxmxv1002 : response message never received
pxmxv1003 : response message never received
pxmxv1004 : response message never received



So, Multicast looks to be an issue, but one node (03) apparently joined and another node (04) partially joined, but went into an error state. I've validated that the switches have IGMP snooping turned on.


Can you offer any suggestions for the pvecm add <master> failure on pxmxv1002? Also, do you have any suggestions as to how to complete the cluster sync for pxmxv1004?

Thanks!




------------------------
(more output below)
root@pxmxv1001:/etc/pve# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: pxmx-c01
Cluster Id: 22949
Cluster Member: Yes
Cluster Generation: 16
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pxmxv1001
Node ID: 1
Multicast addresses: 239.192.89.254
Node addresses: 10.194.0.50
root@pxmxv1001:/etc/pve# pvecm nodes
Node Sts Inc Joined Name
1 M 12 2015-04-09 16:20:22 pxmxv1001
2 M 16 2015-04-09 16:20:22 pxmxv1003
3 X 0 pxmxv1004





Output of pveversion -v:


root@pxmxv1002:~# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-37-pve: 2.6.32-147
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
Is your hosts file properly configured on all your hosts?

Are all hosts in the same subnet?

Serge
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!