quorum disk: status offline. howto resolve?

mir · Feb 24, 2013

Hi all,

I am having a weird problem with by quorum disk.

clustat
Cluster Status for midgaard @ Sun Feb 24 11:49:05 2013
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
esx1 1 Online, Local
esx2 2 Online
/dev/block/8:17 0 Offline, Quorum Disk

/etc/init.d/cman restart
Stopping cluster:
Leaving fence domain... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping qdiskd... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown:[ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Starting qdiskd... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]

Above seems ok but the syslog continiuesly showing this
Feb 24 11:47:29 esx1 pmxcfs[2281]: [status] crit: cpg_send_message failed: 9

Pinging node1 > node2
asmping -I vmbr30 224.0.2.1 esx2 -c 2
asmping joined (S,G) = (*,224.0.2.234)
pinging 172.16.3.9 from 172.16.3.8
unicast from 172.16.3.9, seq=1 dist=0 time=0.188 ms
unicast from 172.16.3.9, seq=2 dist=0 time=0.197 ms
multicast from 172.16.3.9, seq=2 dist=0 time=0.237 ms

--- 172.16.3.9 statistics ---
2 packets transmitted, time 2001 ms
unicast:
2 packets received, 0% packet loss
rtt min/avg/max/std-dev = 0.188/0.192/0.197/0.014 ms
multicast:
1 packets received, 0% packet loss since first mc packet (seq 2) recvd
rtt min/avg/max/std-dev = 0.237/0.237/0.237/0.000 ms

pinging node2 -> node1
asmping -I vmbr30 224.0.2.1 esx1 -c 2
asmping joined (S,G) = (*,224.0.2.234)
pinging 172.16.3.8 from 172.16.3.9
unicast from 172.16.3.8, seq=1 dist=0 time=0.326 ms
unicast from 172.16.3.8, seq=2 dist=0 time=0.302 ms
multicast from 172.16.3.8, seq=2 dist=0 time=0.251 ms

--- 172.16.3.8 statistics ---
2 packets transmitted, time 2000 ms
unicast:
2 packets received, 0% packet loss
rtt min/avg/max/std-dev = 0.302/0.314/0.326/0.012 ms
multicast:
1 packets received, 0% packet loss since first mc packet (seq 2) recvd
rtt min/avg/max/std-dev = 0.251/0.251/0.251/0.000 ms

dietmar · Feb 24, 2013

The iscsi disk seems offline?

You also need to restart 'pve-cluster' after cman restart (to get rid of the syslog warning).

mir · Feb 24, 2013

Hi all,

I have solved the problem. The cause was a failing heuristic check.
<heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/>

One of the nodes has an unused onboard eth0 because I added a better nic as eth2 yesterday. Since all nodes prior to yesterday had an eth0 the heuristic was checking eth0. Changing to checking eth1 solved the problem

.

QeD. When messing with the network double check all network configuration and dependend configuration a second time.

Search

Search

quorum disk: status offline. howto resolve?

mir

Famous Member

dietmar

Proxmox Staff Member

mir

Famous Member