Do I need to recover from "A,NV,NMW" QDevice status?

Giovanni

Renowned Member
Apr 1, 2009
109
11
83
"NV" looks not normal in my cluster status. According to https://pve.proxmox.com/pve-docs/pvecm.1.html "NV" means
V / NV: If the QDevice will cast a vote for the node. In a split-brain situation, where the corosync connection between the nodes is down, but they both can still communicate with the external corosync-qnetd daemon, only one node will get the vote.

Code:
# pvecm status
Cluster information
-------------------
Name:             HOMELAB
Config Version:   16
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Jun 23 16:45:09 2024
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1.d74b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      8
Quorum:           5
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.10.9 (local)
0x00000002          1   A,NV,NMW 192.168.10.7
0x00000003          1    A,V,NMW 192.168.10.8
0x00000005          1    A,V,NMW 192.168.10.12
0x00000000          4            Qdevice
 
Hello,

Can you ping the QDevice from the node with the NV flag? Could you please post your Corosync config? It is at /etc/pve/corosync.conf.

Please set the QDevice to cast only a single vote instead of four. If the QDevice goes down, since it has such a big number of votes the changes of the entire cluster being out of quorum becomes higher. At the moment the cluster only has 8 votes so if the qdevice goes down, then the cluster would be out of quorum.
 
Hello,

Can you ping the QDevice from the node with the NV flag? Could you please post your Corosync config? It is at /etc/pve/corosync.conf.

Please set the QDevice to cast only a single vote instead of four. If the QDevice goes down, since it has such a big number of votes the changes of the entire cluster being out of quorum becomes higher. At the moment the cluster only has 8 votes so if the qdevice goes down, then the cluster would be out of quorum.

I don't know how I ended up with the Qdevice having four votes. Where do I change that?

The Qdevice is indeed not pingable from the `NV` node... all other nodes it pings. Will check my firewall (the Qdevice is an Azure VM in the cloud)

corosync.conf
Code:
root@pvecm-unraid:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: MS1
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.10.11
  }
  node {
    name: kahuna
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.10.13
  }
  node {
    name: nuc
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.10.9
  }
  node {
    name: pvecm-dsm
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.8
  }
  node {
    name: pvecm-unraid
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.7
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: lms
      host: 10.189.97.45
      tls: on
    }
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: HOMELAB
  config_version: 18
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
Hello,

I suspect the QDevice has 4 votes because of the line algorithm: lms , the default value is ffsplit. From man 8 corosync-qdevice:

Code:
MODEL NET ALGORITHMS
       Algorithms are used to change behavior of how corosync-qnetd provides votes to a given node/partition. Currently there are two algorithms supported.

       ffsplit
              This  one  makes sense only for clusters with an even number of nodes. It provides exactly one vote to the partition with the highest number of active nodes. If there are two exactly similar partitions, it provides its vote to the partition with higher score. The score is computed
              as (number_of_connected_nodes + number_of_connected_nodes_with_passed_heuristics - number_of_connected_nodes_with_failed_heuristics) If the scores are equal, the vote is provided to partition with the most clients connected to the qnetd server. If this number is also  equal,  then
              the tie_breaker is used. It is able to transition its vote if the currently active partition becomes partitioned and a non-active partition still has at least 50% of the active nodes. Because of this, a vote is not provided if the qnetd connection is not active.

              To use this algorithm it's required to set the number of votes per node to 1 (default) and the qdevice number of votes has to be also 1. This is achieved by setting quorum.device.votes key in corosync.conf file to 1.

       lms    Last-man-standing. If the node is the only one left in the cluster that can see the qnetd server then we return a vote.

              If more than one node can see the qnetd server but some nodes can't see each other then the cluster is divided up into 'partitions' based on their ring_id and this algorithm returns a vote to the partition with highest heuristics score (computed the same way as for the ffsplit al‐
              gorithm), or if there is more than 1 partition with equal scores, the largest active partition or, if there is more than 1 equal partition, the partition that contains the tie_breaker node (lowest, highest, etc). For LMS to work, the number of qdevice votes has to be  set  to  de‐
              fault (so just delete quorum.device.votes key from corosync.conf).

If you want to test whether all nodes are reachable on all Corosync links you can use:


Code:
corosync-cfgtool -n

and verify that all links are bot enabled and connected.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!