I recently built 8 additional nodes (licensed) to add to our existing (licensed) cluster. When trying to add the new nodes which were built using proxmox 5 i received the following message:
Login succeeded.
Request addition of this node
Remote side is not able to use API for Cluster join! Pass the 'use_ssh' switch or update the remote side.
I subsequently used the --use_ssh switch which added the node to the cluster but corosync does not function.
root@prox21:~# pvecm status
Cannot initialize CMAP service
root@prox21:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2018-07-24 10:23:22 PDT; 2min 40s ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 2871 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=killed, signal=TERM)
Main PID: 2871 (code=killed, signal=TERM)
CPU: 37ms
Jul 24 10:21:52 prox21 systemd[1]: Starting Corosync Cluster Engine...
Jul 24 10:21:52 prox21 corosync[2871]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide
Jul 24 10:21:52 prox21 corosync[2871]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to p
Jul 24 10:21:52 prox21 corosync[2871]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augea
Jul 24 10:21:52 prox21 corosync[2871]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas syste
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Start operation timed out. Terminating.
Jul 24 10:23:22 prox21 systemd[1]: Failed to start Corosync Cluster Engine.
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Unit entered failed state.
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Failed with result 'timeout'.
root@prox21:~# journalctl -xe
Jul 24 10:25:40 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:26:00 prox21 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has begun starting up.
Jul 24 10:26:01 prox21 pvesr[3167]: error with cfs lock 'file-replication_cfg': no quorum!
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 24 10:26:01 prox21 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: Unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has failed.
--
-- The result is failed.
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Unit entered failed state.
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 24 10:26:01 prox21 cron[2492]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Jul 24 10:26:04 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
PVE Version of existing cluster:
pve-manager/4.4-13/7ea56165 (running kernel: 4.4.44-1-pve)
PVE Version of Node to add:
pve-manager/5.2-5/eb24855a (running kernel: 4.15.17-1-pve)
I have tried restarting corosync, pve-cluster. The two servers are on the same network. The new node actually appears on the cluster as ssh will login correctly. It will simply not sync and communicate further.
Login succeeded.
Request addition of this node
Remote side is not able to use API for Cluster join! Pass the 'use_ssh' switch or update the remote side.
I subsequently used the --use_ssh switch which added the node to the cluster but corosync does not function.
root@prox21:~# pvecm status
Cannot initialize CMAP service
root@prox21:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2018-07-24 10:23:22 PDT; 2min 40s ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 2871 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=killed, signal=TERM)
Main PID: 2871 (code=killed, signal=TERM)
CPU: 37ms
Jul 24 10:21:52 prox21 systemd[1]: Starting Corosync Cluster Engine...
Jul 24 10:21:52 prox21 corosync[2871]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide
Jul 24 10:21:52 prox21 corosync[2871]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to p
Jul 24 10:21:52 prox21 corosync[2871]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augea
Jul 24 10:21:52 prox21 corosync[2871]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas syste
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Start operation timed out. Terminating.
Jul 24 10:23:22 prox21 systemd[1]: Failed to start Corosync Cluster Engine.
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Unit entered failed state.
Jul 24 10:23:22 prox21 systemd[1]: corosync.service: Failed with result 'timeout'.
root@prox21:~# journalctl -xe
Jul 24 10:25:40 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:40 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:46 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:52 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:25:58 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:26:00 prox21 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has begun starting up.
Jul 24 10:26:01 prox21 pvesr[3167]: error with cfs lock 'file-replication_cfg': no quorum!
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 24 10:26:01 prox21 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: Unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has failed.
--
-- The result is failed.
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Unit entered failed state.
Jul 24 10:26:01 prox21 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 24 10:26:01 prox21 cron[2492]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Jul 24 10:26:04 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:26:04 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [quorum] crit: quorum_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [confdb] crit: cmap_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [dcdb] crit: cpg_initialize failed: 2
Jul 24 10:26:10 prox21 pmxcfs[2889]: [status] crit: cpg_initialize failed: 2
PVE Version of existing cluster:
pve-manager/4.4-13/7ea56165 (running kernel: 4.4.44-1-pve)
PVE Version of Node to add:
pve-manager/5.2-5/eb24855a (running kernel: 4.15.17-1-pve)
I have tried restarting corosync, pve-cluster. The two servers are on the same network. The new node actually appears on the cluster as ssh will login correctly. It will simply not sync and communicate further.