Hi, all
i've read many posts here and elsewhere, the relevant wiki sections, but still I have doubts about the exact procedure to remove nodes on pve cluster (2.x and 3.x)
this section "Remove_a_cluster_node" "http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Remove_a_cluster_node says, as of today:
that is simply not clear enough to me...
eg: there could be two cases when removing a member node:
- the node is still up (repurpose the node? test a new version on production hardware?)
- the node is down (the node is broken?)
from the above example, it seems to be up when the admin "deletes" from the cluster (from a remaining node)
then, the procedure warns that "the removed node" should be powered off, and "make sure that it will not power on again."
I read this as "power off the node AFTER you have removed it"...
1) is the procedure identical if the node is up/down?
2) is it possible/better/mandatory to shut down the node to be removed BEFORE "deleting" it from the cluster?
3) what happens if (perhaps by mistake) you power up or keep powered the removed node (leaving it as it was after removing)? is it recoverable? it is a disaster? what to do?
4) how to properly manage a node removal/replace/rejoin on two nodes clusters (eg: quorum issues)?
reading various posts, it seems that "re-adding nodes is not supported" and that "reinstalling the node from scratch" is the supported way (see this and this)
but in the wiki section there is nothing (!) about this, there is only (just below the previous about removing) the section "Re-installing_a_cluster_node" http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Re-installing_a_cluster_node which is not at all about reinstalling from scratch! And it seems complicated and risky, not only not supported; it was written by a user "Nigel Kukard" in middle 2012: http://pve.proxmox.com/mediawiki/index.php?title=Proxmox_VE_2.0_Cluster&diff=4412&oldid=4151
many posts point out that when you "delete" a node, its "definition" is still in the cluster "db", which so "remembers" also "deleted" nodes, and in some case you have the need to use a -force option also to join a new node with previous (deleted) IP...
many posts ask for better and complete cluster documentation, and instructions, and I agree (and need those)...
can we build a better "cluster info" section so that everyone has crystal clear what can be done, what must not be done and what are the risks for possibile but dangerous situations...?
once I understand well the topic, I am willing to update the wiki, if needed.
A proxmox team contribution here would be optimal but, based on posts reading, I somewhat feel they will not be so clear about this cluster topics...
their responses often explain cluster troubles with: "it's a difficult matter" "it's not supported" "it's all about quorum"... All true but, to me, it's more like "It's not well documented and explained".
cheers
Marco
i've read many posts here and elsewhere, the relevant wiki sections, but still I have doubts about the exact procedure to remove nodes on pve cluster (2.x and 3.x)
this section "Remove_a_cluster_node" "http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Remove_a_cluster_node says, as of today:
Code:
Move all virtual machines out of the node, just use the Central Web-based Management to migrate or delete all VM´s. Make sure you have no local backups you want to keep, or save them accordingly.
Log in to one remaining node via ssh. Issue a pvecm nodes command to identify the nodeID:
hp1# pvecm nodes
Node Sts Inc Joined Name
1 M 156 2011-09-05 10:39:09 hp1
2 M 156 2011-09-05 10:39:09 hp2
3 M 168 2011-09-05 11:24:12 hp4
4 M 160 2011-09-05 10:40:27 hp3
Issue the delete command (here deleting node hp2):
hp1# pvecm delnode hp2
If the operation succeeds no output is returned, just check the node list again with 'pvecm nodes' (or just 'pvecm n').
ATTENTION: you need to power off the removed node, and make sure that it will not power on again.
that is simply not clear enough to me...
eg: there could be two cases when removing a member node:
- the node is still up (repurpose the node? test a new version on production hardware?)
- the node is down (the node is broken?)
from the above example, it seems to be up when the admin "deletes" from the cluster (from a remaining node)
then, the procedure warns that "the removed node" should be powered off, and "make sure that it will not power on again."
I read this as "power off the node AFTER you have removed it"...
1) is the procedure identical if the node is up/down?
2) is it possible/better/mandatory to shut down the node to be removed BEFORE "deleting" it from the cluster?
3) what happens if (perhaps by mistake) you power up or keep powered the removed node (leaving it as it was after removing)? is it recoverable? it is a disaster? what to do?
4) how to properly manage a node removal/replace/rejoin on two nodes clusters (eg: quorum issues)?
reading various posts, it seems that "re-adding nodes is not supported" and that "reinstalling the node from scratch" is the supported way (see this and this)
but in the wiki section there is nothing (!) about this, there is only (just below the previous about removing) the section "Re-installing_a_cluster_node" http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Re-installing_a_cluster_node which is not at all about reinstalling from scratch! And it seems complicated and risky, not only not supported; it was written by a user "Nigel Kukard" in middle 2012: http://pve.proxmox.com/mediawiki/index.php?title=Proxmox_VE_2.0_Cluster&diff=4412&oldid=4151
many posts point out that when you "delete" a node, its "definition" is still in the cluster "db", which so "remembers" also "deleted" nodes, and in some case you have the need to use a -force option also to join a new node with previous (deleted) IP...
many posts ask for better and complete cluster documentation, and instructions, and I agree (and need those)...
can we build a better "cluster info" section so that everyone has crystal clear what can be done, what must not be done and what are the risks for possibile but dangerous situations...?
once I understand well the topic, I am willing to update the wiki, if needed.
A proxmox team contribution here would be optimal but, based on posts reading, I somewhat feel they will not be so clear about this cluster topics...
their responses often explain cluster troubles with: "it's a difficult matter" "it's not supported" "it's all about quorum"... All true but, to me, it's more like "It's not well documented and explained".
cheers
Marco
Last edited: