quorum disk: status offline. howto resolve?

mir

Famous Member
Apr 14, 2012
3,568
127
133
Copenhagen, Denmark
Hi all,

I am having a weird problem with by quorum disk.

clustat
Cluster Status for midgaard @ Sun Feb 24 11:49:05 2013
Member Status: Quorate


Member Name ID Status
------ ---- ---- ------
esx1 1 Online, Local
esx2 2 Online
/dev/block/8:17 0 Offline, Quorum Disk


/etc/init.d/cman restart
Stopping cluster:
Leaving fence domain... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping qdiskd... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown:[ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Starting qdiskd... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]

Above seems ok but the syslog continiuesly showing this
Feb 24 11:47:29 esx1 pmxcfs[2281]: [status] crit: cpg_send_message failed: 9

Pinging node1 > node2
asmping -I vmbr30 224.0.2.1 esx2 -c 2
asmping joined (S,G) = (*,224.0.2.234)
pinging 172.16.3.9 from 172.16.3.8
unicast from 172.16.3.9, seq=1 dist=0 time=0.188 ms
unicast from 172.16.3.9, seq=2 dist=0 time=0.197 ms
multicast from 172.16.3.9, seq=2 dist=0 time=0.237 ms


--- 172.16.3.9 statistics ---
2 packets transmitted, time 2001 ms
unicast:
2 packets received, 0% packet loss
rtt min/avg/max/std-dev = 0.188/0.192/0.197/0.014 ms
multicast:
1 packets received, 0% packet loss since first mc packet (seq 2) recvd
rtt min/avg/max/std-dev = 0.237/0.237/0.237/0.000 ms



pinging node2 -> node1
asmping -I vmbr30 224.0.2.1 esx1 -c 2
asmping joined (S,G) = (*,224.0.2.234)
pinging 172.16.3.8 from 172.16.3.9
unicast from 172.16.3.8, seq=1 dist=0 time=0.326 ms
unicast from 172.16.3.8, seq=2 dist=0 time=0.302 ms
multicast from 172.16.3.8, seq=2 dist=0 time=0.251 ms


--- 172.16.3.8 statistics ---
2 packets transmitted, time 2000 ms
unicast:
2 packets received, 0% packet loss
rtt min/avg/max/std-dev = 0.302/0.314/0.326/0.012 ms
multicast:
1 packets received, 0% packet loss since first mc packet (seq 2) recvd
rtt min/avg/max/std-dev = 0.251/0.251/0.251/0.000 ms
 
The iscsi disk seems offline?

You also need to restart 'pve-cluster' after cman restart (to get rid of the syslog warning).
 
Hi all,

I have solved the problem. The cause was a failing heuristic check.
<heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/>

One of the nodes has an unused onboard eth0 because I added a better nic as eth2 yesterday. Since all nodes prior to yesterday had an eth0 the heuristic was checking eth0. Changing to checking eth1 solved the problem:).

QeD. When messing with the network double check all network configuration and dependend configuration a second time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!