[SOLVED] Node not connecting to cluster after cluster restart

plofkat

Active Member
Mar 20, 2013
51
2
28
Had to shut down the entire cluster for maintenance on the power grid, After starting cluster back up (7 nodes), node 4 failed to join the cluster.
Syslog for the server is working.

I have moved all vms from the server to another host for now.

Short of removing the host and reinstalling, is there any way to fix this?

Code:
Jul 03 16:23:58 vwk-prox04 cron[2417]: (CRON) INFO (Running @reboot jobs)
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [MAIN  ] Please migrate config file to nodelist.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Please migrate config file to nodelist.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] The network interface [10.10.1.104] is now up.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] The network interface [10.10.1.104] is now up.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cmap
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cfg
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cpg
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] Watchdog not enabled by configuration
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] resource load_15min missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] resource memory_used missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [WD    ] no resources configured.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [QUORUM] Using quorum provider corosync_votequorum
Jul 03 16:23:58 vwk-prox04 corosync[2420]: crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 03 16:23:58 vwk-prox04 corosync[2420]: error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cmap
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cfg
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cpg
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] Watchdog not enabled by configuration
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] resource load_15min missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] resource memory_used missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] no resources configured.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QUORUM] Using quorum provider corosync_votequorum
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
Jul 03 16:23:58 vwk-prox04 systemd[1]: Failed to start Corosync Cluster Engine.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Unit entered failed state.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Failed with result 'exit-code'.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE API Daemon...
Jul 03 16:23:58 vwk-prox04 pvestatd[2467]: starting server
Jul 03 16:23:58 vwk-prox04 pve-firewall[2472]: starting server
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started PVE Status Daemon.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started Proxmox VE firewall.
Jul 03 16:23:58 vwk-prox04 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Jul 03 16:23:58 vwk-prox04 kernel: ip_set: protocol 6
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: starting server
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: starting 3 worker(s)
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2494 started
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2495 started
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2496 started
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started PVE API Daemon.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE API Proxy Server...
Jul 03 16:23:59 vwk-prox04 pve-ha-crm[2516]: starting server
 
Solved - Somehow the server's corosync IP entry was missing from the /etc/hosts file