We have a 4-nodes cluster, 3 nodes with actual VMs (pm3, pm4, pm5) and 4th (pmtmp) was just there for quorum (it used to be a 2+1 cluster originally).
After some update pm4 and pm5 lost connectivity to pm3 and pmtmp and the cluster has split. They couldn't ping each other, it was a pve-firewall problem and after stopping pve-firewall the connectivity works again. Tested with ping, omping and passwordless SSH.
The cluster, however, is still in a split state with "Activity blocked". If I'm reading tcpdump correctly, there are no attempts made by pm4 and pm5 to contact pm3 and vice versa.
How can I make the cluster work again? Any way to force one of the hosts to try connecting to the others again?
After some update pm4 and pm5 lost connectivity to pm3 and pmtmp and the cluster has split. They couldn't ping each other, it was a pve-firewall problem and after stopping pve-firewall the connectivity works again. Tested with ping, omping and passwordless SSH.
The cluster, however, is still in a split state with "Activity blocked". If I'm reading tcpdump correctly, there are no attempts made by pm4 and pm5 to contact pm3 and vice versa.
How can I make the cluster work again? Any way to force one of the hosts to try connecting to the others again?
Code:
root@pm3:~# pvecm status
Quorum information
------------------
Date: Mon May 28 13:15:37 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3/519028
Quorate: No
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 1.1.2.239
0x00000004 1 1.1.2.252 (local)
root@pm3:~# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
3 1 pmtmp
4 1 pm3 (local)
root@pm3:~# pveversion
pve-manager/5.1-43/bdb08029 (running kernel: 4.13.13-5-pve)
root@pm4:~# LC_ALL=C LANG=C pvecm status
Quorum information
------------------
Date: Mon May 28 14:17:18 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/6048
Quorate: No
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 1.1.2.228 (local)
0x00000002 1 1.1.2.240
root@pm4:~# LC_ALL=C LANG=C pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 pm4 (local)
2 1 pm5
root@pmtmp:~# LC_ALL=C LANG=C pvecm status
Quorum information
------------------
Date: Mon May 28 13:16:54 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000003
Ring ID: 3/519152
Quorate: No
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 1.1.2.239 (local)
0x00000004 1 1.1.2.252
root@pmtmp:~# LC_ALL=C LANG=C pvecm nodes
Membership information
----------------------
Nodeid Votes Name
3 1 pmtmp (local)
4 1 pm3
root@pmtmp:~# LC_ALL=C LANG=C pveversion
pve-manager/5.1-41/0b958203 (running kernel: 4.13.13-4-pve)
root@pm5:~# LC_ALL=C LANG=C pvecm status
Quorum information
------------------
Date: Mon May 28 14:17:40 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1/6048
Quorate: No
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 1.1.2.228
0x00000002 1 1.1.2.240 (local)
root@pm5:~# LC_ALL=C LANG=C pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 pm4
2 1 pm5 (local)