I started this discussion here: http://forum.proxmox.com/threads/10629-New-Kernel-and-bug-fixes?p=60597#post60597
And moved it here since it's no longer on topic.
Problem symptoms:
2 nodes dropped out of the cluster for apparently no reason.
bcvm1 was then the only node in
Investigation summary:
According to various logs, they dropped out at separate times.
Maybe it was after losing network connection for a brief time.
Looking in nagios, there was no network interruption at that time (19:11 first retransmit, 19:20 first evicted, and 19:33 second evicted)
assumptions/conclusions:
When a single node is disconnected, it loses quorum (whether or not the other nodes do).
When a single node loses quorum, cman stops on that node.
Because cman stops, the node will not rejoin ever.
Therefore:
If another, drops out, you could lose quorum on the running nodes, simply because of too few votes.
In my case, this means quorum is lost on the still connected node (1+qdisk=2 left out of 4 = inquorate).
SOLUTION (bad solution):
And rebooting bcvm3 fixed the quorum between qdisk, bcvm1, and bcvm3.
Then bcvm2 could rejoin simply by restarting cman.
The result of this was that everything looked fine in the CL, but the gui still showed vms offline and vm hosts red...
So I logged in a different node, same problem
So I restarted pve-cluster on the red node, and then clicked on all the nodes in the gui, and it was green again.
Prevention for next time:
unknown... so far I changed the number of votes on the qdisk to 5, so hopefully the last node alive will have Quorum, so I don't have to reboot anything, only restart cman.
I was also thinking about making a watchdog script that restarts cman and pve-cluster. (pve-cluster seems to need a restart for the web gui to work again, but not related to quorum)
Here are large parts of some logs while the problem had already occurred:
And moved it here since it's no longer on topic.
Problem symptoms:
2 nodes dropped out of the cluster for apparently no reason.
bcvm1 was then the only node in
Investigation summary:
According to various logs, they dropped out at separate times.
Maybe it was after losing network connection for a brief time.
Looking in nagios, there was no network interruption at that time (19:11 first retransmit, 19:20 first evicted, and 19:33 second evicted)
assumptions/conclusions:
When a single node is disconnected, it loses quorum (whether or not the other nodes do).
When a single node loses quorum, cman stops on that node.
Because cman stops, the node will not rejoin ever.
Therefore:
If another, drops out, you could lose quorum on the running nodes, simply because of too few votes.
In my case, this means quorum is lost on the still connected node (1+qdisk=2 left out of 4 = inquorate).
SOLUTION (bad solution):
And rebooting bcvm3 fixed the quorum between qdisk, bcvm1, and bcvm3.
Then bcvm2 could rejoin simply by restarting cman.
The result of this was that everything looked fine in the CL, but the gui still showed vms offline and vm hosts red...
So I logged in a different node, same problem
So I restarted pve-cluster on the red node, and then clicked on all the nodes in the gui, and it was green again.
Prevention for next time:
unknown... so far I changed the number of votes on the qdisk to 5, so hopefully the last node alive will have Quorum, so I don't have to reboot anything, only restart cman.
I was also thinking about making a watchdog script that restarts cman and pve-cluster. (pve-cluster seems to need a restart for the web gui to work again, but not related to quorum)
Here are large parts of some logs while the problem had already occurred:
Code:
=================================
bcvm1
=================================
root@bcvm1:/etc/pve# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:19:07 2012
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
bcvm2 1 Offline
bcvm3 2 Offline
bcvm1 3 Online, Local
/dev/loop1 0 Online, Quorum Disk
root@bcvm1:/etc/pve# cat /var/log/cluster/qdiskd.log
[...]
Aug 23 18:05:44 qdiskd Quorum Daemon Initializing
Aug 23 18:05:48 qdiskd Heuristic: 'ip addr | grep vmbr0 | grep -q UP' UP
Aug 23 18:05:59 qdiskd Node 1 is the master
Aug 23 18:06:11 qdiskd Initial score 2/3
Aug 23 18:06:11 qdiskd Initialization complete
Aug 23 18:06:11 qdiskd Score sufficient for master operation (2/3; required=2); upgrading
Aug 24 13:08:24 qdiskd qdiskd: write (system call) has hung for 15 seconds
Aug 24 13:08:24 qdiskd In 15 more seconds, we will be evicted
Aug 24 13:09:13 qdiskd qdisk cycle took more than 3 seconds to complete (63.980000)
Aug 24 13:09:13 qdiskd qdiskd on node 1 reports hung write()
Aug 24 13:09:13 qdiskd qdiskd on node 1 reports hung write()
Aug 24 13:09:13 qdiskd qdiskd on node 1 reports hung write()
Aug 24 13:40:13 qdiskd qdisk cycle took more than 3 seconds to complete (5.300000)
Aug 29 19:20:08 qdiskd Node 2 evicted
Aug 29 19:33:37 qdiskd Assuming master role
Aug 29 19:33:40 qdiskd Writing eviction notice for node 1
Aug 29 19:33:43 qdiskd Node 1 evicted
[...]
# gunzip -c /var/log/cluster/corosync.log.1.gz | less
[...]
(first retransmit appears 19:11:12)
Aug 29 19:11:12 corosync [TOTEM ] Retransmit List: 17a7ad 17a7ae 17a7af 17a7b0 17a7b1 17a7b2 17a7b3 17a7b4 17a7b5 17a7b6 17a7b7 17a7b8 17a7b9 17a7ba 17a7bb 17a7bc 17a7bd 17a7be 17a7bf 17a7c0
Aug 29 19:11:12 corosync [TOTEM ] Retransmit List: 17a7ad 17a7ae 17a7af 17a7b0 17a7b1 17a7b2 17a7b3 17a7b4 17a7b5 17a7b6 17a7b7 17a7b8 17a7b9 17a7ba 17a7bb 17a7bc 17a7bd 17a7be 17a7bf 17a7c0
[...]
Aug 29 19:20:26 corosync [TOTEM ] A processor failed, forming new configuration.
Aug 29 19:20:28 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:20:28 corosync [CLM ] New Configuration:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.20)
Aug 29 19:20:28 corosync [CLM ] Members Left:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.58)
Aug 29 19:20:28 corosync [CLM ] Members Joined:
Aug 29 19:20:28 corosync [QUORUM] Members[2]: 1 3
Aug 29 19:20:28 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:20:28 corosync [CLM ] New Configuration:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.20)
Aug 29 19:20:28 corosync [CLM ] Members Left:
Aug 29 19:20:28 corosync [CLM ] Members Joined:
Aug 29 19:20:28 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Aug 29 19:20:28 corosync [QUORUM] Members[2]: 1 3
Aug 29 19:20:28 corosync [CPG ] chosen downlist: sender r(0) ip(10.3.0.20) ; members(old:3 left:1)
Aug 29 19:20:28 corosync [MAIN ] Completed service synchronization, ready to provide service.
[...]
Aug 29 19:33:46 corosync [TOTEM ] A processor failed, forming new configuration.
Aug 29 19:34:42 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:34:42 corosync [CLM ] New Configuration:
Aug 29 19:34:42 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:34:42 corosync [CLM ] Members Left:
Aug 29 19:34:42 corosync [CLM ] r(0) ip(10.3.0.20)
Aug 29 19:34:42 corosync [CLM ] Members Joined:
Aug 29 19:34:42 corosync [CMAN ] quorum lost, blocking activity
Aug 29 19:34:42 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 29 19:34:42 corosync [QUORUM] Members[1]: 3
Aug 29 19:34:42 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:34:42 corosync [CLM ] New Configuration:
Aug 29 19:34:42 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:34:42 corosync [CLM ] Members Left:
Aug 29 19:34:42 corosync [CLM ] Members Joined:
Aug 29 19:34:42 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Aug 29 19:34:42 corosync [CPG ] chosen downlist: sender r(0) ip(10.3.0.19) ; members(old:2 left:1)
Aug 29 19:34:42 corosync [MAIN ] Completed service synchronization, ready to provide service.
[...]
=================================
bcvm2
=================================
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused
root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
root@bcvm2:~# /etc/init.d/cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Starting qdiskd... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
root@bcvm2:~# less /var/log/cluster/qdiskd.log
root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
bcvm2 1 Online, Local
bcvm3 2 Offline
bcvm1 3 Offline
/dev/loop1 0 Offline, Quorum Disk
root@bcvm2:~# cat /var/log/cluster/qdiskd.log
[...]
Aug 16 13:34:31 qdiskd Unable to match label 'proxmox1_qdisk' to any device
Aug 16 13:39:09 qdiskd Unable to match label 'proxmox1_qdisk' to any device
Aug 16 13:40:11 qdiskd Unable to match label '/mnt/pve/bcnas1san/proxmox1_qdisk' to any device
Aug 16 13:40:54 qdiskd Unable to match label '/mnt/pve/bcnas1san/proxmox1_qdisk' to any device
Aug 16 13:41:45 qdiskd Warning: /mnt/pve/bcnas1san/proxmox1_qdisk is not a block device
Aug 16 13:41:45 qdiskd qdisk_open: ioctl(BLKSSZGET)Aug 16 13:41:45 qdiskd Specified partition /mnt/pve/bcnas1san/proxmox1_qdisk does not have a qdisk label
Aug 16 13:45:14 qdiskd Specified partition /dev/loop1 does not have a qdisk label
Aug 16 13:46:32 qdiskd Quorum Daemon Initializing
Aug 16 13:46:36 qdiskd Heuristic: 'ip addr | grep vmbr0 | grep -q UP' UP
Aug 16 13:46:59 qdiskd Initial score 2/3
Aug 16 13:46:59 qdiskd Initialization complete
Aug 16 13:46:59 qdiskd Score sufficient for master operation (2/3; required=2); upgrading
Aug 16 13:47:17 qdiskd Assuming master role
Aug 16 14:02:35 qdiskd Node 2 shutdown
Aug 20 11:04:19 qdiskd Node 2 shutdown
Aug 23 13:49:28 qdiskd Node 2 shutdown
Aug 23 15:07:50 qdiskd Node 2 shutdown
Aug 24 13:08:25 qdiskd qdiskd: write (system call) has hung for 15 seconds
Aug 24 13:08:25 qdiskd In 15 more seconds, we will be evicted
Aug 24 13:09:01 qdiskd qdisk cycle took more than 3 seconds to complete (51.240000)
Aug 24 13:09:01 qdiskd qdiskd on node 3 reports hung write()
Aug 24 13:09:01 qdiskd qdiskd on node 3 reports hung write()
Aug 24 13:09:01 qdiskd qdiskd on node 3 reports hung write()
Aug 24 13:09:09 qdiskd qdiskd on node 3 reports hung write()
Aug 24 13:40:13 qdiskd qdisk cycle took more than 3 seconds to complete (6.410000)
Aug 29 19:20:05 qdiskd Writing eviction notice for node 2
Aug 29 19:20:08 qdiskd Node 2 evicted
Aug 29 19:32:54 qdiskd cman_dispatch: Host is down
Aug 29 19:32:54 qdiskd Halting qdisk operations
Aug 30 10:17:58 qdiskd Quorum Daemon Initializing
Aug 30 10:18:02 qdiskd Heuristic: 'ip addr | grep vmbr0 | grep -q UP' UP
Aug 30 10:18:10 qdiskd Node 3 is the master
Aug 30 10:18:25 qdiskd Initial score 2/3
Aug 30 10:18:25 qdiskd Initialization complete
Aug 30 10:18:25 qdiskd Score sufficient for master operation (2/3; required=2); upgrading
[...]
# gunzip -c /var/log/cluster/corosync.log.1.gz | less
[...]
Aug 29 19:20:26 corosync [TOTEM ] A processor failed, forming new configuration.
Aug 29 19:20:28 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:20:28 corosync [CLM ] New Configuration:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.20)
Aug 29 19:20:28 corosync [CLM ] Members Left:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.58)
Aug 29 19:20:28 corosync [CLM ] Members Joined:
Aug 29 19:20:28 corosync [QUORUM] Members[2]: 1 3
Aug 29 19:20:28 corosync [CLM ] CLM CONFIGURATION CHANGE
Aug 29 19:20:28 corosync [CLM ] New Configuration:
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.19)
Aug 29 19:20:28 corosync [CLM ] r(0) ip(10.3.0.20)
Aug 29 19:20:28 corosync [CLM ] Members Left:
Aug 29 19:20:28 corosync [CLM ] Members Joined:
Aug 29 19:20:28 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Aug 29 19:20:28 corosync [CPG ] chosen downlist: sender r(0) ip(10.3.0.20) ; members(old:3 left:1)
Aug 29 19:20:28 corosync [MAIN ] Completed service synchronization, ready to provide service.
[...]
=================================
bcvm3
=================================
root@bcvm3:~# clustat
Could not connect to CMAN: Connection refused
root@bcvm3:~# cat /var/log/cluster/qdiskd.log
Aug 27 09:37:21 qdiskd qdisk cycle took more than 3 seconds to complete (8.450000)
Aug 29 19:19:34 qdiskd cman_dispatch: Host is down
Aug 29 19:19:34 qdiskd Halting qdisk operations
# gunzip -c /var/log/cluster/corosync.log.1.gz
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE