Quorum Disk down causes odd GUI behaviors

adamb · Dec 1, 2014

I have found that in all my clusters 2.x, 3.x. That when the quorum disk goes down for an extended period of time, the proxmox GUI is in a strange state. The little monitor icon next to each VM is black and only has the VMID and not the description next to it. If I get on any of the clusters and look at clustat everything is AOK. All the VM's are running and everything is good. If I click on one of the VM's in the GUI, it does show them as a "Status" of "running", I can open a console and everything is 100% other than the VM's lacking a description and the little monitor is black. I also noticed that the gui reports both nodes as "red". Clustat reports the cluster as quorate and everything seems ok. Once I bring the quorum disk back online, every comes back to life in the GUI. This is happening across all 10-15 inhouse clusters. I haven't had a chance to test some of the ones out in the field.

adamb · Dec 2, 2014

I guess I will have to open a support ticket for this one.

dietmar · Dec 2, 2014

adamb said:
I guess I will have to open a support ticket for this one.

Instead, use three nodes (quorum disk is clumsy, and make problems all the time).

adamb · Dec 2, 2014

dietmar said:
Instead, use three nodes (quorum disk is clumsy, and make problems all the time).

That is in no way an excuse or a proper response for a paying customer.

Red Hat pushes quorum disks and they never take the stance of "quorum disk is clumsy". In fact we planned on going full red hat before we went with proxmox and the Red Hat engineers had nothing negative to say about the use of quorum disks. You are the only one I have ever seen make this statement.

I would love to know how the quorum disk being offline for an extended period of time and the GUI reporting VM's as not running is in any way the quorum disks fault. All CLI commands were working and reporting the cluster correctly. This tells me it has nothing to do with "quorum disk is clumsy and make problems all the time) and more so a bug in the proxmox gui.

Don't get me wrong, we have had our fair share of issues with quorum disks, but overall they are solid and have provided a good solution. We have been running clusters with quorum disks for almost 3 years now with no real indication they are "clumsy"

Sometimes I wonder if you think about what you post as we are a re-seller and a good customer. If we gave responses to our customers such as this, we would no longer be in business.

e100 · Dec 2, 2014

If your quorum disk is down you obviously do not have quorum.

Without quorum nothing works properly because no single node "knows" what the state of the cluster is.
So when quorum is lost, no nodes make any changes.

There is nothing wrong with the GUI or Proxmox, the problem is you lost quorum.

I do not think his response was rude, sort and to the point, sure but not rude.

If your complaint is that when the quorum disk goes down that you loose quorum and things stop working then it seems perfectly logical to eliminate the SPOF(Single Point of Failure) quorum disk.
With three nodes one can go down and all is well since you still have quorum, thus no SPOF.
If one broken thing can take down my cluster I certainly would not consider that a "solid...good solution"

adamb · Dec 2, 2014

e100 said:
If your quorum disk is down you obviously do not have quorum.

Without quorum nothing works properly because no single node "knows" what the state of the cluster is.
So when quorum is lost, no nodes make any changes.

There is nothing wrong with the GUI or Proxmox, the problem is you lost quorum.

I do not think his response was rude, sort and to the point, sure but not rude.

If your complaint is that when the quorum disk goes down that you loose quorum and things stop working then it seems perfectly logical to eliminate the SPOF(Single Point of Failure) quorum disk.
With three nodes one can go down and all is well since you still have quorum, thus no SPOF.
If one broken thing can take down my cluster I certainly would not consider that a "solid...good solution"

The cluster was quorate as both nodes could still see each other and make a proper decision. The cluster was functioning just as it should but the GUI was not. The only thing that stopped working is the GUI. No VM's stopped, nothing was fenced, everything was running 100% as expected. We didn't see a minute of down time. I can reproduce it across 15+ of our in house clusters on a regular basis.

Maybe I am just terrible at getting the issue across as both of you didn't truly understand what I am trying to say. I do feel he provided a response before really understanding the issue which is exactly why you don't respond like that to your clients.

mir · Dec 2, 2014

If you have a cluster of 2 nodes and a quorum disk and the quorum disk goes down you still have quorum (2 two nodes). If a down quorum disk meant loss of quorum regardless of running nodes there would be no purpose for a quorum disk.

I have a setup exactly of two nodes and a quorum disk (using a qnap for this) and I have never experienced a failing GUI the times the qnap has been down for service. One thing I could imagine could cause such a behavior is if the quorum disk was on the same network as the nodes and rgmanager congesting the network for failing attempts to establish connection with the quorum disk. In my setup the quorum disk is placed on a different network than the network used for cluster communication.

adamb · Dec 2, 2014

mir said:
If you have a cluster of 2 nodes and a quorum disk and the quorum disk goes down you still have quorum (2 two nodes). If a down quorum disk meant loss of quorum regardless of running nodes there would be no purpose for a quorum disk.

I have a setup exactly of two nodes and a quorum disk (using a qnap for this) and I have never experienced a failing GUI the times the qnap has been down for service. One thing I could imagine could cause such a behavior is if the quorum disk was on the same network as the nodes and rgmanager congesting the network for failing attempts to establish connection with the quorum disk. In my setup the quorum disk is placed on a different network than the network used for cluster communication.

Same here, quorum disk is on a completely separate network. Cluster communication is done over a dedicated 10GB back end. We have rebooted/taken the quorum disk machine down a number of times but this is the first time it was down for an extended period of time and we actually needed to get on the GUI. At first we thought something was seriously wrong, that is until I ssh'd into the hosts and saw that everything was actually ok.

At least someone understands what I am saying.

e100 · Dec 3, 2014

That makes no sense, if you are using a quorum disk and its offline then it is impossible to have quorum. Well I suppose you could set expected votes to 1 but that is not the same as having quorum.

When you lose quorum you are right, nothing will get fenced, no VMS will stop and things will keep running. Without quorum the cluster cannot make a decision on what to do so it does nothing. This is the desired behavior.

You can reproduce this easily?
Try starting, creating or live migrating a VM using proxmox cli tools while the quorum disk is down. You will see that without quorum the CLI fails too.

adamb · Dec 3, 2014

e100 said:
That makes no sense, if you are using a quorum disk and its offline then it is impossible to have quorum. Well I suppose you could set expected votes to 1 but that is not the same as having quorum.

When you lose quorum you are right, nothing will get fenced, no VMS will stop and things will keep running. Without quorum the cluster cannot make a decision on what to do so it does nothing. This is the desired behavior.

You can reproduce this easily?
Try starting, creating or live migrating a VM using proxmox cli tools while the quorum disk is down. You will see that without quorum the CLI fails too.

The quorum disk can be taken offline at anytime because there are still two nodes up. The only time I would run into an issue is if one of the nodes failed while the quorum disk was down. Then the cluster would not have quorum.

The cluster has quorum the entire time. I am looking right at "clustat" which has nothing to do with proxmox cli or proxmox. If it is reporting the cluster has quorum, it indeed does.

https://access.redhat.com/documenta.../Cluster_Administration/s1-admin-display.html

mir · Dec 3, 2014

Are you talking out of experience?

1) With two nodes and a quorum disk expected votes is 3 (nodes and quorum disk has a vote each)
2) so quorum is established with two votes
3) two votes can be formed by either 1 node and the quorum disk or by two nodes

QeD a quorum can be established by to nodes and a missing quorum disk.

e100 · Dec 3, 2014

Looks like I've been corrected, maybe I need to upgrade to ECC brain cells.

I have seen the GUI act as you describe but only when I have lost quorum. If you are sure that the cluster had quorum then I really have no idea why the GUI would act that way.

dietmar · Dec 3, 2014

First, I really think qdisk introduces much complexity, which is error prone, and most users do not understand how it works exactly. That is why we recommend to use at least 3 nodes.

The GUI behavior you describe indicates that the there is a problem with 'pvestatd'. That daemon query VM and Disk status and sends that information to all cluster nodes. Usually this get blocked when one of your storages is offline and the daemon queries the status of such storage. You can test that using

# pvesm status

Does above command block when the problem occur?

adamb · Dec 3, 2014

dietmar said:
First, I really think qdisk introduces much complexity, which is error prone, and most users do not understand how it works exactly. That is why we recommend to use at least 3 nodes.

The GUI behavior you describe indicates that the there is a problem with 'pvestatd'. That daemon query VM and Disk status and sends that information to all cluster nodes. Usually this get blocked when one of your storages is offline and the daemon queries the status of such storage. You can test that using

# pvesm status

Does above command block when the problem occur?

Just to give you guys an idea. Here is what the GUI looks like when the quorum disk is down.

Doesn't look like the command gets blocked.

Code:

root@lanprox1:~# pvesm status
  /dev/sdd1: read failed after 0 of 4096 at 12451840: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 12562432: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
storage 'backup' is not online
backup    nfs 0               0               0               0 100.00%
drbd0     lvm 1      1169879040               0       488304640 0.50%
drbd1     lvm 1      1169879040               0       561700864 0.50%
local     dir 1       387808108       261304000       126504108 67.88%

Code:

root@lanprox1:~# clustat
Cluster Status for lanprox @ Wed Dec  3 07:00:01 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 lanprox1                                                            1 Online, Local, rgmanager
 lanprox2                                                            2 Online, rgmanager
 /dev/block/8:49                                                     0 Offline, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State
 ------- ----                                                     ----- ------                                                     -----
 pvevm:100                                                        lanprox1                                                         started
 pvevm:101                                                        lanprox2                                                         started
 pvevm:102                                                        lanprox2                                                         started
 pvevm:103                                                        lanprox1                                                         started
 pvevm:104                                                        lanprox2                                                         started
 pvevm:105                                                        lanprox2                                                         started
 pvevm:106                                                        lanprox1                                                         started
 pvevm:107                                                        lanprox1                                                         started

We do present an NFS share from the same box as the quorum disk for a few odd backup jobs. Doesn't look like the "pvesm status" command is getting blocked though.

adamb · Dec 8, 2014

Would it be best if I file a bug report for this issue? I definitely want a resolution as I don't want any of my support people and technicians thinking that a cluster is down when its not.

mir · Dec 8, 2014

You have not mentioned anything about your cluster setup so just to rule out fencing related problems. Do you have fencing configured? If yes, does it work as it should?
Please paste cluster.conf.

adamb · Dec 8, 2014

mir said:
You have not mentioned anything about your cluster setup so just to rule out fencing related problems. Do you have fencing configured? If yes, does it work as it should?
Please paste cluster.conf.

Yep fencing works 100%. This happens on clusters with simple ipmi fencing and clusters using dual fencing (ipmi and apc PDU).

This config is just from one of my clusters using ipmi.

Code:

root@lanprox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="26" name="lanprox">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="3" label="lanprox_qdisk" master_wins="1" tko="10"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" cipher="3" ipaddr="10.80.12.178" lanplus="1" login="USERID" name="ipmi1" passwd="PASSW0RD" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" cipher="3" ipaddr="10.80.12.179" lanplus="1" login="USERID" name="ipmi2" passwd="PASSW0RD" power_wait="5"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="lanprox1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="lanprox2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="102"/>
    <pvevm autostart="1" vmid="103"/>
    <pvevm autostart="1" vmid="104"/>
    <pvevm autostart="1" vmid="105"/>
    <pvevm autostart="1" vmid="106"/>
    <pvevm autostart="1" vmid="107"/>
  </rm>
</cluster>

Here is a config using the dual fence. They are pretty much identical other than the 2nd fence device.

Code:

root@cloudprox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="22" name="cloudprox">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="3" label="cluster_qdisk" master_wins="1" status_file="/var/log/cluster/status" tko="10"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ilo4" cipher="3" ipaddr="10.80.8.102" lanplus="1" login="fence" name="ipmi1" passwd="7fprjMLc" power_wait="5"/>
    <fencedevice agent="fence_ilo4" cipher="3" ipaddr="10.80.8.103" lanplus="1" login="fence" name="ipmi2" passwd="7fprjMLc" power_wait="5"/>
    <fencedevice agent="fence_apc" ipaddr="10.80.8.109" login="device" name="apc1" passwd_script="/usr/local/bin/ccsnu"/>
    <fencedevice agent="fence_apc" ipaddr="10.80.8.104" login="device" name="apc2" passwd_script="/usr/local/bin/ccsnu"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="cloudprox1" nodeid="1" votes="1">
      <fence>
        <method name="apc">
          <device action="off" name="apc1" port="1" secure="on"/>
          <device action="off" name="apc2" port="1" secure="on"/>
          <device action="on" name="apc1" port="1" secure="on"/>
          <device action="on" name="apc2" port="1" secure="on"/>
        </method>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="cloudprox2" nodeid="2" votes="1">
      <fence>
        <method name="apc">
          <device action="off" name="apc1" port="2" secure="on"/>
          <device action="off" name="apc2" port="2" secure="on"/>
          <device action="on" name="apc1" port="2" secure="on"/>
          <device action="on" name="apc2" port="2" secure="on"/>
        </method>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
    <pvevm autostart="1" vmid="102"/>
    <pvevm autostart="1" vmid="103"/>
    <pvevm autostart="1" vmid="105"/>
    <pvevm autostart="1" vmid="104"/>
    <pvevm autostart="1" vmid="106"/>
    <pvevm autostart="1" vmid="107"/>
  </rm>
</cluster>

mir · Dec 8, 2014

You could try adding this 'status_file="/var/log/cluster/status"' to the troublesome cluster to have it log each time it updates quorum status. Maybe this will give a hint to what goes wrong.

Also you should study this carefully and maybe try changing values for timings:

Code:

[I]master_wins[/I][B]="[/B]0[B]"
[/B]If set to 1 (on), only the qdiskd master will advertise its votes to CMAN. In a network partition, only the qdisk master will provide votes to CMAN. Consequently, that node will automatically "win" in a fence race.This option requires careful tuning of the CMAN timeout, the qdiskd timeout, and CMAN's quorum_dev_poll value. As a rule of thumb, CMAN's quorum_dev_poll value should be equal to Totem's token timeout and qdiskd's timeout (interval*tko) should be less than half of Totem's token timeout. See section 3.3.1 for more information.
This option only takes effect if there are no heuristics configured and it is valid only for 2 node cluster. This option is automatically disabled if heuristics are defined or cluster has more than 2 nodes configured.
In a two-node cluster with no heuristics and no defined vote count (see above), this mode is turned by default. If enabled in this way at startup and a node is later added to the cluster configuration or the vote count is set to a value other than 1, this mode will be disabled.

Code:

[B]3.3.1. Quorum Disk Timings
[/B]
Qdiskd should not be used in environments requiring failure detection times of less than approximately 10 seconds.
Qdiskd will attempt to automatically configure timings based on the totem timeout and the TKO. If configuring manually, Totem's token timeout [B]must[/B] be set to a value at least 1 interval greater than the the following function:
interval * (tko + master_wait + upgrade_wait)
So, if you have an interval of 2, a tko of 7, master_wait of 2 and upgrade_wait of 2, the token timeout should be at least 24 seconds (24000 msec).
It is recommended to have at least 3 intervals to reduce the risk of quorum loss during heavy I/O load. As a rule of thumb, using a totem timeout more than 2x of qdiskd's timeout will result in good behavior.
An improper timing configuration will cause CMAN to give up on qdiskd, causing a temporary loss of quorum during master transition.

adamb · Dec 8, 2014

mir said:

You could try adding this 'status_file="/var/log/cluster/status"' to the troublesome cluster to have it log each time it updates quorum status. Maybe this will give a hint to what goes wrong.

Also you should study this carefully and maybe try changing values for timings:

Code:

[I]master_wins[/I][B]="[/B]0[B]"
[/B]If set to 1 (on), only the qdiskd master will advertise its votes to CMAN. In a network partition, only the qdisk master will provide votes to CMAN. Consequently, that node will automatically "win" in a fence race.This option requires careful tuning of the CMAN timeout, the qdiskd timeout, and CMAN's quorum_dev_poll value. As a rule of thumb, CMAN's quorum_dev_poll value should be equal to Totem's token timeout and qdiskd's timeout (interval*tko) should be less than half of Totem's token timeout. See section 3.3.1 for more information.
This option only takes effect if there are no heuristics configured and it is valid only for 2 node cluster. This option is automatically disabled if heuristics are defined or cluster has more than 2 nodes configured.
In a two-node cluster with no heuristics and no defined vote count (see above), this mode is turned by default. If enabled in this way at startup and a node is later added to the cluster configuration or the vote count is set to a value other than 1, this mode will be disabled.

Code:

[B]3.3.1. Quorum Disk Timings
[/B]
Qdiskd should not be used in environments requiring failure detection times of less than approximately 10 seconds.
Qdiskd will attempt to automatically configure timings based on the totem timeout and the TKO. If configuring manually, Totem's token timeout [B]must[/B] be set to a value at least 1 interval greater than the the following function:
interval * (tko + master_wait + upgrade_wait)
So, if you have an interval of 2, a tko of 7, master_wait of 2 and upgrade_wait of 2, the token timeout should be at least 24 seconds (24000 msec).
It is recommended to have at least 3 intervals to reduce the risk of quorum loss during heavy I/O load. As a rule of thumb, using a totem timeout more than 2x of qdiskd's timeout will result in good behavior.
An improper timing configuration will cause CMAN to give up on qdiskd, causing a temporary loss of quorum during master transition.

There is no trouble some cluster. This happens across all 15 of my inhouse clusters and so far 2 of the clusters I have tested out in the field. Proxmox versions range from 2.3-3.3 and they all have the issue. I don't understand the point of messing with anything in my config. The cluster is running 100% per the logs and clustat but the proxmox gui is showing the vm's as down.

I've said it a few times now, but the cluster is acting 100% as expected. If you take a look at my post above with the screen shots, you will see exactly what the gui looks like when my quorum disk goes offline. You will then see "clustat" reporting the cluster having no issues. There are no fence actions which tells me everything is ok, the vm's stay running and all is well. Once I bring my quorum disk back online its like a switch and the gui starts working again. Even once the gui starts working again, nothing changes, the vm's stay running and no fence actions take place. Power down the quorum device and it happens again. Also happens on clusters with and without NFS shares which run on the device providing the quorum.

I do however appreciate your time and input!

spirit · Dec 9, 2014

Hi,

I don't think that your problem is related to quorum or cluster state.

The problem is that pvestatd is hanging somewhere.
pvestatd read stats from the host, then the vms, then storages sequentially.
They are some timeout implemented for storage (ping check, nfs stats checks,...) to try to detect offline storage and bypass it.

Sometime, timeout don't work and pvestatd hang on this storage check.
That's why no more infos are displayed on the gui.

So the question is, do you use your storage where the quorum disk is down, for other thing (vm storage, backup storage,....)

also check

cat /var/log/daemon.log|grep pvestatd

maybe you'll have some logs

Quorum Disk down causes odd GUI behaviors

Famous Member

Famous Member

Proxmox Staff Member

Famous Member

Renowned Member

Famous Member

Famous Member

Famous Member

Renowned Member

Famous Member

Famous Member

Renowned Member

Proxmox Staff Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Distinguished Member