[SOLVED] Error on Cluster Node Add

Coolguy3289 · Mar 12, 2017

When I run the command "pvecm add vm.psi.local" it asks me for the root password of the host cluster box. I type it in, but then receive this error:

Code:

root@vm2:~# pvecm add vm.psi.local
root@vm.psi.local's password:
unable to copy ssh ID: cat: write error: Permission denied
root@vm2:~#

Any Suggestions?

dietmar · Mar 12, 2017

The cluster is not quorate?

Coolguy3289 · Mar 12, 2017

This may be the issue, the problem is that the old node that was added has been reinstalled. But I can't remove it because proxmox says it doesn't exist.

Coolguy3289 · Mar 12, 2017

I have forced the vote with 'pvecm expected 1'
I have now started the add process on the slave node, but this is the problem I am running into, and the whole reason I reinstalled the first time.

Code:

root@vm2:~# pvecm add vm.psi.local
root@vm.psi.local's password:
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details.
waiting for quorum...

And it just sits waiting for Quorum. When I run systemctl status this is what I get:

Code:

root@vm2:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: failed (Result: exit-code) since Sun 2017-03-12 13:54:00 EDT; 1min 14s ago
  Process: 2008 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)

Mar 12 13:52:59 vm2 corosync[2016]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Mar 12 13:52:59 vm2 corosync[2017]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 12 13:52:59 vm2 corosync[2017]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Mar 12 13:52:59 vm2 corosync[2017]: [TOTEM ] The network interface is down.
Mar 12 13:52:59 vm2 corosync[2017]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Mar 12 13:52:59 vm2 corosync[2017]: [QB    ] server name: cmap
Mar 12 13:52:59 vm2 corosync[2017]: [SERV  ] Service engine loaded: corosync configuration service [1]
Mar 12 13:52:59 vm2 corosync[2017]: [QB    ] server name: cfg
Mar 12 13:52:59 vm2 corosync[2017]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Mar 12 13:54:00 vm2 corosync[2008]: Starting Corosync Cluster Engine (corosync): [FAILED]
Mar 12 13:54:00 vm2 systemd[1]: corosync.service: control process exited, code=exited status=1
Mar 12 13:54:00 vm2 systemd[1]: Failed to start Corosync Cluster Engine.
Mar 12 13:54:00 vm2 systemd[1]: Unit corosync.service entered failed state.

When I run pvecm status in another shell:

Code:

root@vm2:/etc/pve# pvecm status
Cannot initialize CMAP service

Is there any place where I can see a more detailed log on why corosync failed?

Coolguy3289 · Mar 12, 2017

I seem to just be figuring things out. I followed this guide: http://pve.proxmox.com/wiki/Cluster_Manager#pvecm_separate_node_without_reinstall
and just started from scratch following: http://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network

Pablo Alcaraz · Nov 1, 2017

I tried to re join and instance after following the guide, but I had to do an extra step:.

the solution was to add to /etc/ssh/sshd_config the option

MaxAuthTries 50

restart the ssh server,
then try again

and after the node was added, restore the previous value in MaxAuthTries and restart the ssh server again.

Basically it was not related to Proxmox. but it blocked me out.

Search

Search

[SOLVED] Error on Cluster Node Add

Coolguy3289

Active Member

dietmar

Proxmox Staff Member

Coolguy3289

Active Member

Coolguy3289

Active Member

Coolguy3289

Active Member

Pablo Alcaraz

Member