how to reinstall node in cluster Two nodes with quorum device

auranext

Well-Known Member
Jun 5, 2018
54
2
48
124
Hi,

Based on the documentation I understand that I need to remove qdevice before adding node to cluster.
I don t know how to do that in secure manner.
Otherwise I can restore node files like /var/lib/pve-cluster/config.db but I don t know what exact files I need to import and what the exact procedure.

thank you

maxime
 
If I understand you correctly, you have currently have a 2 node cluster + qdevice?

  • In that case, first remove the qdevice: pvecm qdevice remove
  • Then check the pvecm status confirming that only 2 votes are expected at max
  • Move all guests from the node that is to be reinstalled
  • Remove the one node following the guide https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
  • Make sure that there are no leftovers (/etc/pve/nodes/<old node>, in /etc/pve/priv/authorized_keys, ...)
  • reinstall the node
  • add it to the cluster
  • install qdevice packages on the reinstalled node
  • do a pvecm qdevice setup again
That way everything should work fine. Should you encounter some issues later, for example in a live migration regarding SSH keys. Run pvecm updatecerts on all nodes.

And of course, having backups of your guests, just in case, is always a good idea :)
 
thank you for this procedure.
what is worry me is if I remove the quorum device and a node, the resulting node will be isolated and will perhaps shut down because corosync does not have enough votes.
In my experience of Cluster 2nodes in PVE5.3 without qdev I have to set exceptionally corosync expeted vote to "1" before invoque delnode.
Is it useless with PVE6.2 ?
 
Once the node is alone, expected votes should be 1 since it is the only node in the cluster. If you use HA it is a good idea to stop the HA services until everything is back to where you want it
Code:
systemctl stop pve-ha-lrm
systemctl stop pve-ha-crm
 
hum , sorry I m back (one more time !)
I m testing removing one node and I need a little clarification at this step
  • first remove the qdevice: pvecm qdevice remove
  • the command need to be executed on all the cluster nodes ? or just the node that is to be reinstalled ?
 
no, should be enough to run on one node. Compare the output of pvecm status before and after. The qdevice will still be listed, but will not have a vote and the overall number of expected votes should also be down.
 
Hi,
@aaron I've tested the procedure and there is an inconsistency
  • In that case, first remove the qdevice: pvecm qdevice remove
  • Then check the pvecm status confirming that only 2 votes are expected at max
  • Move all guests from the node that is to be reinstalled
  • Remove the one node following the guide https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
  • Make sure that there are no leftovers (/etc/pve/nodes/<old node>, in /etc/pve/priv/authorized_keys, ...)
  • reinstall the node
  • add it to the cluster
  • install qdevice packages on the reinstalled node
  • do a pvecm qdevice setup again
I encounter a pb at the step "Remove the one node following the guide https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node"
The procedure say after poweredoff the node execute pvecm delnode <node> on the other node.
But the quorum is not established because the node is alone, expected votes is 2...
So can I exec the delnode command when the 2 nodes are powered on ?
 
Oh true.
In that case, the best approach is probably to set the expected votes to 1 manually. In a larger cluster, this is usually not a problem. I will see how we can mention that in the docs.

Code:
pvecm expected 1
 
Hi,
I have checked the delnode procedure and here is the results

root@FLEXCLIPVE03:~# pvecm delnode FLEXITXPVE03
Could not kill node (error = CS_ERR_NOT_EXIST)
Killing node 2
command 'corosync-cfgtool -k 2' failed: exit code 1


It seems that FLEXITXPVE03 node has been removed in other way...
How I can be totally sure that the node has been removed ?
and how I can validate the procedure ?


STEP BY STEP DETAILED :

root@FLEXCLIPVE03:~# pvecm status
Cluster information
-------------------
Name: cluster3
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Nov 23 21:15:42 2022
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.108
Quorate: No

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags: Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 NA,NV,NMW 192.168.103.1 (local)
0x00000000 0 Qdevice (votes 0)




root@FLEXCLIPVE03:~# pvecm expected 1
root@FLEXCLIPVE03:~# echo $?
0
root@FLEXCLIPVE03:~# pvecm status
Cluster information
-------------------
Name: cluster3
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Nov 23 21:21:31 2022
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.108
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 NA,NV,NMW 192.168.103.1 (local)
0x00000000 0 Qdevice (votes 0)

root@FLEXCLIPVE03:~# pvecm delnode FLEXITXPVE03
Could not kill node (error = CS_ERR_NOT_EXIST)
Killing node 2
command 'corosync-cfgtool -k 2' failed: exit code 1
 
Last edited: