cluster add node replacing qdevice

150d

Member
Mar 23, 2022
38
5
13
Hi,

I think I may have made a mistake:

I'm running a cluster with two nodes and a qdevice on an additional Raspberry. To this, I have now added a third node.

The joining of the third node was successful. But "pvecm status" gives this output:

Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.6.1.253
0x00000002          1    A,V,NMW 10.6.1.251 (local)
0x00000003          1         NR 10.6.1.250
0x00000000          1            Qdevice

The third node is not shown with the same flags as the first two. On reading up on cluster docs, I found that you are supposed to first remove the qdevice before adding the odd-numbered node - which I hadn't done.

So, on one of the old nodes I ran the command "pvecm qdevice remove". After this, I get this cluster status:

Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1  NA,NV,NMW 10.6.1.253
0x00000002          1  NA,NV,NMW 10.6.1.251 (local)
0x00000003          1         NR 10.6.1.250
0x00000000          0            Qdevice (votes 0)

It appears the qdevice has no vote any more, even if it has not gone away completly.

At this point I decided to stop before I dig myself an even deeper hole.


1.) What do the status flags with "pvecm status" (NA, NV, NMW, NR) actually mean? Is it a problem that the new node has different flags than the other two? Might it just need a little time to sync up with the cluster?

2.) Was it a serious mistake to not remove the qdevice before joining the third node? Is there something I need to do to recover?


Any help would be much appreciated!


Regards
 
Last edited:
Update:

I decided to simply delete the third node, reinstall and join the cluster again. But after I did that (with the qdevice removed this time), the new node still shows up as "not registering":

Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1  NA,NV,NMW 10.6.1.253
0x00000002          1  NA,NV,NMW 10.6.1.251 (local)
0x00000003          1         NR 10.6.1.250
0x00000000          0            Qdevice (votes 0)

Now I'm at a loss what to do next. :-(


Update #2:

I have repeated the process a number of times: Deleting the node, reinstalling Proxmox, removing host directory, power-off at the exact right moment - no success. The third node keeps showing as NR.
 
Last edited:
Sure:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.6.1.253
  }
  node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.6.1.251
  }
  node {
    name: node3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.6.1.250
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: ClusterName
  config_version: 13
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

It's the same over all three nodes.
 
Last edited:
Try a systemctl restart corosync on node 1 and 2.
I don't believe this: Half a day's work, and in the end that was all it took. After the restart it now says:

Code:
Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.6.1.253
0x00000002          1 10.6.1.251 (local)
0x00000003          1 10.6.1.250

The flag "qdevice" is gone, it now just says "quorate". And the whole flags column with members is gone as well (which I believe is what it should look like.)

Thank you!!
 
Last edited:
  • Like
Reactions: fba and UdoB