Unfortunately my cluster has failed again even with pve2-test1. The cluster split into two subsets this time.
First subset, two nodes.
root@hera:~# pvecm status
Quorum information
------------------
Date: Wed Aug 7 00:49:29 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 1/604052
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.150
0x00000004 1 192.168.1.60 (local)
Second subset, three nodes.
root@zeus:~# pvecm status
Quorum information
------------------
Date: Wed Aug 7 00:49:34 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 2/603936
Quorate: Yes
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 3
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.1.100 (local)
0x00000005 1 192.168.1.70
0x00000006 1 192.168.1.80
Looking at the logs the split occurred around 8pm. The knet error says "link down".
August 6th 2019, 20:03:42.878 [KNET ] link: host: 1 link: 0 is down syslog_message: [KNET ] link: host: 1 link: 0 is down @version:1 syslog_pid:1939599 @timestamp:August 6th 2019, 20:03:42.878 syslog_program:corosync syslog_hostname:leto priority:30 severity_label:informational _id:EhZiZmwBTsBZz8olcEEt _type:doc _index:logstash-2019.08.06
August 6th 2019, 20:03:42.879 [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) @version:1 syslog_pid:1939599 @timestamp:August 6th 2019, 20:03:42.879 syslog_program:corosync syslog_hostname:leto priority:30 severity_label:informational _id:FBZiZmwBTsBZz8olcEEt _type:doc _index:logstash-2019.08.06 _score: -
I can't see any other network errors in the logs around that time.