Removal of failed node

breakaway9000

Renowned Member
Dec 20, 2015
91
21
73
A few months ago one of my nodes (3 node cluster) failed. I replaced it with a new machine at around the same time, then added another node. (So now we have 1 dead node, 4 live nodes).

I went through this documentation to remove the node: https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

Here is the output of pvecm status before removing the failed node using pvecm delnode:

Code:
# pvecm status
Quorum information
------------------
Date:             Sun Mar  4 10:25:30 2018
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000002
Ring ID:          2/2816
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 172.17.1.51 (local)
0x00000003          1 172.17.1.52
0x00000004          1 172.17.1.53
0x00000005          1 172.17.1.54

I typed

Code:
# pvecm delnode host1

Where "host1" was the name of the node, as shown in the Web GUI. The command completed successfully and gave the output

Code:
Killing node 1

Here is the output of pvecm status after removing the node using pvecm delnode:

Code:
pvecm status
Quorum information
------------------
Date:             Sun Mar  4 10:47:19 2018
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000002
Ring ID:          2/2816
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 172.17.1.51 (local)
0x00000003          1 172.17.1.52
0x00000004          1 172.17.1.53
0x00000005          1 172.17.1.54

As you can see, the node is now gone (correct # of expected votes etc) but it showing up in the Web interface. How do I completely get rid of it?
 
Ok... I have resolved this. Turns out you also have to go to

Code:
/etc/pve/nodes

And remove the folder by the name of your node to fully remove it from the WebGUI.

Anyone else think this should really be in the documentation? Or did I do something wrong here and mess this up?
 
  • Like
Reactions: kolonuk
I think the command should also remove the related folder to clean things up in webgui.
 
Did you reload the GUI (Ctrl+F5) after deleting the node?
 
  • Like
Reactions: andrei2
I can concur that it is still necessary to manually remove the config files after removing the node, even in Proxmox 7. Is there a reason this isn't in the wiki?
 
  • Like
Reactions: kolonuk
If I remove a failed node, are the containers and VMs taken care of automatically? Is there some other process I need to do to let Proxmox know that they are not around anymore?

How about backups? What is the implication of existing backups on VM/LXCs after I re-create the node (with new HW) and recreating VM/LXCs that "just so happen" to have the same IDs? What if (for example), I had a VM with ID 139, and later on an LXC is created with the same number? Will the expiration of old backups continue in a correct manner (ie. a mix of VM and LXC containers with the same ID will follow the same expiration rule), or do I have to remove the old backups manually?

Thanks!

George
 
This thread contains a basic question: Why is the removed node still present in /etc/pve/nodes ?
Or: Why isn't that folder purged automatically when we remove the failed node?

The files in that folders represent all the information of this node including files for the LXC's and VM's that were still running on this host when it died.

If your host died and you have instances that do not auto-migrate you may want to move these files to another node. If you've been using shared storage the instances will automatically boot on the new node if autostart is set.

For example you have a dead node pve02 then you may find a file /etc/pve/nodes/pve02/lxc/107.conf
Move it to a node pve03 like this:
Bash:
mv /etc/pve/nodes/pve02/lxc/107.conf /etc/pve/nodes/pve03/lxc/

Similarly you'll find VM's in /etc/pve/nodes/pve02/qemu-server/

Those .conf files are plain text files. You can read them and find out which file is for what instance. Or short and easy for LXC and VM
Bash:
find /etc/pve/nodes/pve02 -type f -name '*.conf' -print0 | xargs -0 grep -H 'name'

This nodes folder has some more data to provide so it might be wise to either keep it until you're confident it's not required anymore or just move it away until then to get rid of the dead node in GUI. I prefer the first because the dead node in GUI reminds me of a necessary cleanup.

I must say i'm glad it's not removed automatically. But i agree: A note about this in the manual would be nice
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!