Re-joining a previously removed Proxmox node is overly complex due to pmxcfs design limitations

Pissed_off

New Member
Oct 23, 2025
1
0
1
Hi all,

I'm evaluating proxmox for use in an production environment.


I’ve run into a recurring problem that highlights a design limitation in Proxmox VE’s cluster system (pmxcfs).


When a node is removed from a cluster and later needs to re-join, the process consistently fails with errors like:

* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
* corosync is already running, is this node already in a cluster?!


Even after stopping all cluster services (pve-cluster, corosync, pmxcfs), unmounting /etc/pve, and cleaning /var/lib/pve-cluster, Proxmox refuses to allow the node to re-join.
It appears that pmxcfs and its database (config.db) keep stale cluster metadata which prevents a clean re-association with the leader node.


This forces administrators to manually remove low-level files, restart daemons, and rebuild the cluster filesystem — something that should be a supported and automated process.


Environment:
  • Proxmox VE 9.x (same issue existed on 8.x)
  • Two-node cluster + QDevice
  • Both nodes otherwise healthy and reachable

Expected behavior:
There should be a supported, safe command such as:


pvecm rejoin <leader-ip> --force

to reset a node’s cluster state and re-sync it with the leader without touching internal pmxcfs or corosync files.


This would allow admins to re-add nodes without losing VM configurations or reinstalling the entire host. A feature a mature hypervisor cluster system should really provide. In fact, the lack of this feature might be a reason to deaviate to Nutanix or Vmware.


Suggestion:
Introduce a “stateless rejoin” or “force-rejoin” mechanism in pvecm that clears local cluster metadata but preserves /etc/pve/nodes/&lt;hostname&gt;/qemu-server/.


Thanks for considering this — it’s one of the few parts of Proxmox that still feels fragile compared to how rock-solid the rest of the platform is.
 
Last edited:
What are you trying to do when removing it from the cluster? What's the point of that?
 
Any possibility exists, such as accidental or malicious deletion, etc. What is the possible solution?
 
In that case you would just copy over the node information from one of the other cluster members and drop it on the rebuilt host. There's more to it than just that, but the other hosts keep a record of the configs for all other members.