PVE 7.1-10 Removing Failed Node 1 from Cluster

BeDazzler

Member
Jun 22, 2020
34
2
13
Hi All,

I'm in the process of upgrading from 7.1-10 to 7.3-3 and the upgrade failed on 1 of 5 nodes.

I have not progressed with any further upgrades and ended up shutting down the failed node.

The other nodes have all voted and are online, working fine.

The node which failed is node 1 (ie: the first node installed on the cluster).

Is there anything special I need to do in order to remove this from the cluster so it can be rebuilt and added back ?

I am aware I need to remove keys, etc in the /etc/pve folders which I will do once the server is removed.

Can I just use pvecm delnode node1_name on one of the other nodes ?

Do I need to be concerned about the host being removed / rebuilt having the node ID of 1 ?

Many thanks

BeDazzler.
 
Last edited:
Can I just use pvecm delnode node1_name on one of the other nodes ?
If I understood your situation correctly, this should be the only necessary step, as described in [1]. Make sure to never reboot the node in its current configuration again after executing the command (before a re-installation).

Do I need to be concerned about the host being removed / rebuilt having the node ID of 1 ?
No. Since there is no such thing as 'master nodes', this should be no issue.

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
 
Thankyou for the reply.

Working with corosync always results in stress for me since it's easy to break and there are always dramas everytime it stops working properly.

I tried for hours to recover the failed node mainly as a learning exercise however despite the corosync configs all being correct and every node being able to see each other it simply refused to work properly.

The problem is the affected node was communicating on a different interface (infiniband) whilst the other nodes were talking over Ethernet. Despite them all seeing each other they would not agree quorum.

The issue was caused by upgrading to pve 7.3-3 from 7.1-10 and the first upgraded node not bringing up the infiniband interface on boot.

I did resolve this via adding modprobes at start up however it just didn't work and I didn't want to risk breaking a production environment.
(affected node saw other nodes as red crosses, other nodes saw affected node as red cross with question marks on all resources, then GUI would go away)

Manually editing corosync didn't work, then locking prevented changes, it was becoming a nightmare.
Each time I shut down the affected node, the others would come back to life and present a GUI.

I got down to preparing to issue reset-failed, however all the VMs on the affected PVE were already running on other nodes so rather than risk another issue I shut down the affected node instead.

Then I was able to delete it from the cluster.

The other 4 nodes worked out the 5th node was removed and stopped displaying it in the gui once I removed the folder at /etc/pve/nodes/nodename.

To make sure the node cannot reappear I have shut down it's interfaces on the network switch.

The other 4 nodes communicate via Ethernet interface so they should be fine when I retry upgrading them after rebuilding the 5th pve host.

Tomorrow I will rebuild the affected node with same name on pve 7.3-3 and add it back to the cluster using it's Ethernet interface IP Address.

Thanks again for the reply.
 
Last edited:
Just following up on this, today I re-installed PVE then patched and applied custom configs before re-joining the cluster.

All worked perfectly fine, no issues.
 
  • Like
Reactions: Lukas Wagner

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!