[SOLVED] pvecm add failed. How to recover?

Cboxdk

New Member
Aug 15, 2017
4
0
1
41
Hi,

I'm setting up a new cluster with Proxmox 5 but when adding a host to the cluster something went wrong and i don't know how to fix it. The server was added to the gui but is now red in the interface.

Code:
root@host02:~#: pvecm add 10.100.100.11 -ring0_addr host02
The authenticity of host '10.100.100.11 (10.100.100.11)' can't be established.
ECDSA key fingerprint is SHA256:Onbi5vxxxxx
Are you sure you want to continue connecting (yes/no)? yes
root@10.100.100.11's password:
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed because the control process exited with error code.
See "systemctl status corosync.service" and "journalctl -xe" for details.
waiting for quorum...

When running pvecm status from first host i don't see the newly added host.
Code:
root@prox01:~# pvecm status
Quorum information
------------------
Date:             Wed Aug 16 12:27:30 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/12
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.100.100.11 (local)
0x00000002          1 10.100.100.12
0x00000003          1 10.100.100.101
 
Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xe" for details. waiting for quorum...
You need to check, why the corosync service failed.
 
Can you tell me how i do this or where i can find documentation for it?

I think my hosts file was wrong when i submitted the server, but it has been updated now.
 
Code:
systemctl status corosync.service
journalctl -xe
and also syslog

If you think the hosts file was the reason, then try again to add the node.
 
I tried to add the node again but it complains that corosync.conf is already existing.

Code:
root@osd02:~# pvecm add 10.100.100.11 -ring0_addr osd02
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists

Code:
root@osd02:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2017-08-16 12:50:10 UTC; 2h 49min ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 1553 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1553 (code=exited, status=0/SUCCESS)

Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync configuration service
Aug 16 12:50:10 osd02 corosync[1553]: info    [QB    ] withdrawing server sockets
Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Aug 16 12:50:10 osd02 corosync[1553]: info    [QB    ] withdrawing server sockets
Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync profile loading service
Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync resource monitoring service
Aug 16 12:50:10 osd02 corosync[1553]: notice  [SERV  ] Service engine unloaded: corosync watchdog service
Aug 16 12:50:10 osd02 corosync[1553]:  [SERV  ] Unloading all Corosync service engines.
Aug 16 12:50:10 osd02 corosync[1553]: notice  [MAIN  ] Corosync Cluster Engine exiting normally

Code:
root@osd02:~# journalctl -xe
Aug 16 15:39:32 osd02 pveproxy[1584]: worker 23200 finished
Aug 16 15:39:32 osd02 pveproxy[1584]: worker 23199 finished
Aug 16 15:39:32 osd02 pveproxy[1584]: starting 2 worker(s)
Aug 16 15:39:32 osd02 pveproxy[1584]: worker 23209 started
Aug 16 15:39:32 osd02 pveproxy[23208]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:32 osd02 pveproxy[1584]: worker 23210 started
Aug 16 15:39:32 osd02 pveproxy[23209]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:32 osd02 pveproxy[23210]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:33 osd02 pmxcfs[1539]: [quorum] crit: quorum_initialize failed: 2
Aug 16 15:39:33 osd02 pmxcfs[1539]: [confdb] crit: cmap_initialize failed: 2
Aug 16 15:39:33 osd02 pmxcfs[1539]: [dcdb] crit: cpg_initialize failed: 2
Aug 16 15:39:33 osd02 pmxcfs[1539]: [status] crit: cpg_initialize failed: 2
Aug 16 15:39:37 osd02 pveproxy[23208]: worker exit
Aug 16 15:39:37 osd02 pveproxy[23209]: worker exit
Aug 16 15:39:37 osd02 pveproxy[23210]: worker exit
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23208 finished
Aug 16 15:39:37 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23211 started
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23209 finished
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23210 finished
Aug 16 15:39:37 osd02 pveproxy[1584]: starting 2 worker(s)
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23212 started
Aug 16 15:39:37 osd02 pveproxy[1584]: worker 23213 started
Aug 16 15:39:37 osd02 pveproxy[23211]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:37 osd02 pveproxy[23212]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:37 osd02 pveproxy[23213]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:39 osd02 pmxcfs[1539]: [quorum] crit: quorum_initialize failed: 2
Aug 16 15:39:39 osd02 pmxcfs[1539]: [confdb] crit: cmap_initialize failed: 2
Aug 16 15:39:39 osd02 pmxcfs[1539]: [dcdb] crit: cpg_initialize failed: 2
Aug 16 15:39:39 osd02 pmxcfs[1539]: [status] crit: cpg_initialize failed: 2
Aug 16 15:39:42 osd02 pveproxy[23211]: worker exit
Aug 16 15:39:42 osd02 pveproxy[23212]: worker exit
Aug 16 15:39:42 osd02 pveproxy[23213]: worker exit
Aug 16 15:39:42 osd02 pveproxy[1584]: worker 23211 finished
Aug 16 15:39:42 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:42 osd02 pveproxy[1584]: worker 23221 started
Aug 16 15:39:42 osd02 pveproxy[1584]: worker 23212 finished
Aug 16 15:39:42 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:42 osd02 pveproxy[1584]: worker 23213 finished
Aug 16 15:39:42 osd02 pveproxy[1584]: worker 23222 started
Aug 16 15:39:42 osd02 pveproxy[23221]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:42 osd02 pveproxy[23222]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:45 osd02 pmxcfs[1539]: [quorum] crit: quorum_initialize failed: 2
Aug 16 15:39:45 osd02 pmxcfs[1539]: [confdb] crit: cmap_initialize failed: 2
Aug 16 15:39:45 osd02 pmxcfs[1539]: [dcdb] crit: cpg_initialize failed: 2
Aug 16 15:39:45 osd02 pmxcfs[1539]: [status] crit: cpg_initialize failed: 2
Aug 16 15:39:47 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:47 osd02 pveproxy[1584]: worker 23223 started
Aug 16 15:39:47 osd02 pveproxy[23221]: worker exit
Aug 16 15:39:47 osd02 pveproxy[23222]: worker exit
Aug 16 15:39:47 osd02 pveproxy[23223]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:47 osd02 pveproxy[1584]: worker 23221 finished
Aug 16 15:39:47 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:47 osd02 pveproxy[1584]: worker 23224 started
Aug 16 15:39:47 osd02 pveproxy[1584]: worker 23222 finished
Aug 16 15:39:47 osd02 pveproxy[1584]: starting 1 worker(s)
Aug 16 15:39:47 osd02 pveproxy[1584]: worker 23225 started
Aug 16 15:39:47 osd02 pveproxy[23224]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
Aug 16 15:39:47 osd02 pveproxy[23225]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1626.
 
If you are sure that everything is corrected, then you could try to do:
Code:
pvecm add <host> -ring0_addr <IP> -force
So it doesn't throw errors if the node already exists.

The other option would be to go through the files by hand and check if the node is already present in the corosync.conf on the cluster, then a pvecm delnode <node> could work and the corosync.conf on the node that you want to add, has to be removed completely.
 
I deleted the node from the cluster and reinstalled the failed node. All working now thank you. :)