[SOLVED] Node not connecting to cluster after cluster restart

plofkat

Active Member
Mar 20, 2013
51
2
28
Had to shut down the entire cluster for maintenance on the power grid, After starting cluster back up (7 nodes), node 4 failed to join the cluster.
Syslog for the server is working.

I have moved all vms from the server to another host for now.

Short of removing the host and reinstalling, is there any way to fix this?

Code:
Jul 03 16:23:58 vwk-prox04 cron[2417]: (CRON) INFO (Running @reboot jobs)
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [MAIN  ] Please migrate config file to nodelist.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Please migrate config file to nodelist.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [TOTEM ] The network interface [10.10.1.104] is now up.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [TOTEM ] The network interface [10.10.1.104] is now up.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cmap
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cfg
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [QB    ] server name: cpg
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] Watchdog not enabled by configuration
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] resource load_15min missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: warning [WD    ] resource memory_used missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: info    [WD    ] no resources configured.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
Jul 03 16:23:58 vwk-prox04 corosync[2420]: notice  [QUORUM] Using quorum provider corosync_votequorum
Jul 03 16:23:58 vwk-prox04 corosync[2420]: crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 03 16:23:58 vwk-prox04 corosync[2420]: error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 03 16:23:58 vwk-prox04 corosync[2420]: error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cmap
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cfg
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QB    ] server name: cpg
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] Watchdog not enabled by configuration
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] resource load_15min missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] resource memory_used missing a recovery key.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [WD    ] no resources configured.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QUORUM] Using quorum provider corosync_votequorum
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 03 16:23:58 vwk-prox04 corosync[2420]:  [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
Jul 03 16:23:58 vwk-prox04 systemd[1]: Failed to start Corosync Cluster Engine.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Unit entered failed state.
Jul 03 16:23:58 vwk-prox04 systemd[1]: corosync.service: Failed with result 'exit-code'.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE API Daemon...
Jul 03 16:23:58 vwk-prox04 pvestatd[2467]: starting server
Jul 03 16:23:58 vwk-prox04 pve-firewall[2472]: starting server
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started PVE Status Daemon.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started Proxmox VE firewall.
Jul 03 16:23:58 vwk-prox04 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Jul 03 16:23:58 vwk-prox04 kernel: ip_set: protocol 6
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: starting server
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: starting 3 worker(s)
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2494 started
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2495 started
Jul 03 16:23:58 vwk-prox04 pvedaemon[2491]: worker 2496 started
Jul 03 16:23:58 vwk-prox04 systemd[1]: Started PVE API Daemon.
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jul 03 16:23:58 vwk-prox04 systemd[1]: Starting PVE API Proxy Server...
Jul 03 16:23:59 vwk-prox04 pve-ha-crm[2516]: starting server
 
Solved - Somehow the server's corosync IP entry was missing from the /etc/hosts file
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!