Cluster Losing Quorum

mhammett

Renowned Member
Mar 11, 2009
165
2
83
DeKalb, Illinois, United States
I have a cluster of five servers. Everything is fine for some amount of time (attempting to find a way to track what that amount of time is). The servers all lose quorum. I reboot the first server and then everything is fine again.

The quorum also seems kind of messed up. No one agrees on anything and some even think they're isolated. Reboot the one and everything's fine again.


I first thought it was NTP related (and maybe it was at first), but I've since solved the NTP server connectivity issue.
 
The server I rebooted.

Code:
root@DeKalb2:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jan 15 16:42:10 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000004
Ring ID:          1.426b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2 (local)
0x00000005  1 172.24.0.3
root@DeKalb2:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:38:50 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000004
Ring ID:          1.f37b
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      2
Quorum:           3 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000004  1 172.24.0.2 (local)
root@DeKalb2:~# reboot
login as: root
root@172.24.0.2's password:
Linux DeKalb2 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri Jan 15 16:42:00 2021 from 172.20.0.8
root@DeKalb2:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:46:55 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000004
Ring ID:          1.f5bf
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2 (local)
0x00000005  1 172.24.0.3
root@DeKalb2:~# journalctl --since -1h -u systemd-timesyncd
-- Logs begin at Mon 2021-01-18 07:44:06 CST, end at Mon 2021-01-18 07:48:36 CST. --
Jan 18 07:44:08 DeKalb2 systemd[1]: Starting Network Time Synchronization...
Jan 18 07:44:08 DeKalb2 systemd[1]: Started Network Time Synchronization.
Jan 18 07:45:38 DeKalb2 systemd-timesyncd[1655]: Synchronized to time server for the first time 208.89.144.110:123 (ntp3.dna-communications.com).
root@DeKalb2:~# timedatectl timesync-status
       Server: 208.89.144.110 (ntp3.dna-communications.com)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: A3EDDA13
    Precision: 1us (-24)
Root distance: 41.541ms (max: 5s)
       Offset: +2.871ms
        Delay: 2.496ms
       Jitter: 10.955ms
 Packet count: 8
    Frequency: +4.395ppm
root@DeKalb2:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:
Code:
root@DeKalb3:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jan 15 17:04:38 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000005
Ring ID:          1.428b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2
0x00000005  1 172.24.0.3 (local)
root@DeKalb3:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:39:23 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000005
Ring ID:          1.e29f
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      2
Quorum:           3 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000005  1 172.24.0.3 (local)
root@DeKalb3:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:47:03 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000005
Ring ID:          1.f5bf
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2
0x00000005  1 172.24.0.3 (local)
root@DeKalb3:~# timedatectl timesync-status
       Server: 208.89.144.110 (ntp3.dna-communications.com)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: A3EDDA13
    Precision: 1us (-24)
Root distance: 36.063ms (max: 5s)
       Offset: +616us
        Delay: 1.157ms
       Jitter: 720us
 Packet count: 118
    Frequency: -0.316ppm
root@DeKalb3:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
A third server.


Code:
root@Sycamore1:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jan 15 17:04:39 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000001
Ring ID:          1.428b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101 (local)
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2
0x00000005  1 172.24.0.3
root@Sycamore1:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:39:03 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.f533
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      1
Quorum:           3 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101 (local)
root@Sycamore1:~# pvecm status
Cluster information
-------------------
Name:             DNA-DK-SYC
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan 18 07:47:08 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000001
Ring ID:          1.f5bf
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001  1 172.24.0.101 (local)
0x00000002  1 172.24.0.102
0x00000003  1 172.24.0.103
0x00000004  1 172.24.0.2
0x00000005  1 172.24.0.3
root@Sycamore1:~# timedatectl timesync-status
       Server: 208.89.144.110 (ntp3.dna-communications.com)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: A3EDDA13
    Precision: 1us (-24)
Root distance: 24.032ms (max: 5s)
       Offset: -2.406ms
        Delay: 9.564ms
       Jitter: 983us
 Packet count: 119
root@Sycamore1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Hi, current status probably won't show you that much. Try to search logs instead.

You can start with corosync logs, where you can find information if any cluster communication link is lost. If so, there will be probably some networking issue to investigate.