Our 16 node cluster stopped working, shows question marks on all of them on the site and cannot start a VM or anything:
root@CHI01:~# qm start 288
cluster not ready - no quorum?
root@CHI01:~# service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Thu 2020-04-23 19:03:12 EDT; 55min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 39001 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 39001 (code=exited, status=0/SUCCESS)
Apr 23 19:03:11 CHI01 corosync[39001]: [QB ] withdrawing server sockets
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Apr 23 19:03:11 CHI01 corosync[39001]: [QB ] withdrawing server sockets
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync profile loading service
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync resource monitoring service
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync watchdog service
Apr 23 19:03:12 CHI01 corosync[39001]: [MAIN ] Corosync Cluster Engine exiting normally
Apr 23 19:03:12 CHI01 systemd[1]: corosync.service: Failed with result 'timeout'.
Apr 23 19:03:12 CHI01 systemd[1]: Failed to start Corosync Cluster Engine.
Any ideas? Anyone available for hire to help us get this back online?
root@CHI01:~# qm start 288
cluster not ready - no quorum?
root@CHI01:~# service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Thu 2020-04-23 19:03:12 EDT; 55min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 39001 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 39001 (code=exited, status=0/SUCCESS)
Apr 23 19:03:11 CHI01 corosync[39001]: [QB ] withdrawing server sockets
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Apr 23 19:03:11 CHI01 corosync[39001]: [QB ] withdrawing server sockets
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync profile loading service
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync resource monitoring service
Apr 23 19:03:11 CHI01 corosync[39001]: [SERV ] Service engine unloaded: corosync watchdog service
Apr 23 19:03:12 CHI01 corosync[39001]: [MAIN ] Corosync Cluster Engine exiting normally
Apr 23 19:03:12 CHI01 systemd[1]: corosync.service: Failed with result 'timeout'.
Apr 23 19:03:12 CHI01 systemd[1]: Failed to start Corosync Cluster Engine.
Any ideas? Anyone available for hire to help us get this back online?