Hello,
3 Node Test Cluster. Test Case: Cut off node10 (master) cluster network connection
ha-manager statuson node08:
quorum OK
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (active, Tue Jan 8 17:13:31 2019)
lrm node09 (idle, Tue Jan 8 17:13:34 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
ha-manager statuson node09:
quorum OK
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (active, Tue Jan 8 17:13:41 2019)
lrm node09 (idle, Tue Jan 8 17:13:40 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
ha-manager statuson node10:
quorum No quorum on node 'node10'!
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (old timestamp - dead?, Tue Jan 8 17:12:57 2019)
lrm node09 (old timestamp - dead?, Tue Jan 8 17:12:56 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
=> expected behavior: Since node10 got cut off and lost quorum, it should fence itself
BUT: node08 rebooted itself. Uptime after a few moments:
---------------------
node08: 17:18:39 up 0 min, 0 users, load average: 1.77, 0.42, 0.14
node09: 17:18:39 up 24 min, 0 users, load average: 0.92, 0.91, 0.52
node10: 17:18:40 up 24 min, 0 users, load average: 0.26, 0.33, 0.34
Why did node08 reboot and not node10 which was completly out of quorum?
# pvecm status
Quorum information
------------------
Date: Tue Jan 8 17:29:08 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/5620
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.0.8 (local)
0x00000002 1 10.10.0.9
0x00000003 1 10.10.0.10
3 Node Test Cluster. Test Case: Cut off node10 (master) cluster network connection
ha-manager statuson node08:
quorum OK
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (active, Tue Jan 8 17:13:31 2019)
lrm node09 (idle, Tue Jan 8 17:13:34 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
ha-manager statuson node09:
quorum OK
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (active, Tue Jan 8 17:13:41 2019)
lrm node09 (idle, Tue Jan 8 17:13:40 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
ha-manager statuson node10:
quorum No quorum on node 'node10'!
master node10 (old timestamp - dead?, Tue Jan 8 17:12:52 2019)
lrm node08 (old timestamp - dead?, Tue Jan 8 17:12:57 2019)
lrm node09 (old timestamp - dead?, Tue Jan 8 17:12:56 2019)
lrm node10 (old timestamp - dead?, Tue Jan 8 17:12:59 2019)
service vm:100 (node08, started)
=> expected behavior: Since node10 got cut off and lost quorum, it should fence itself
BUT: node08 rebooted itself. Uptime after a few moments:
---------------------
node08: 17:18:39 up 0 min, 0 users, load average: 1.77, 0.42, 0.14
node09: 17:18:39 up 24 min, 0 users, load average: 0.92, 0.91, 0.52
node10: 17:18:40 up 24 min, 0 users, load average: 0.26, 0.33, 0.34
Why did node08 reboot and not node10 which was completly out of quorum?
# pvecm status
Quorum information
------------------
Date: Tue Jan 8 17:29:08 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/5620
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.0.8 (local)
0x00000002 1 10.10.0.9
0x00000003 1 10.10.0.10