Re-joining a previously removed Proxmox node is overly complex due to pmxcfs design limitations

Pissed_off · 2025-10-23T16:27:01+0200

Hi all,

I'm evaluating proxmox for use in an production environment.

I’ve run into a recurring problem that highlights a design limitation in Proxmox VE’s cluster system (pmxcfs).

When a node is removed from a cluster and later needs to re-join, the process consistently fails with errors like:

* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
* corosync is already running, is this node already in a cluster?!

Even after stopping all cluster services (pve-cluster, corosync, pmxcfs), unmounting /etc/pve, and cleaning /var/lib/pve-cluster, Proxmox refuses to allow the node to re-join.
It appears that pmxcfs and its database (config.db) keep stale cluster metadata which prevents a clean re-association with the leader node.

This forces administrators to manually remove low-level files, restart daemons, and rebuild the cluster filesystem — something that should be a supported and automated process.

Environment:

Proxmox VE 9.x (same issue existed on 8.x)
Two-node cluster + QDevice
Both nodes otherwise healthy and reachable

Expected behavior:
There should be a supported, safe command such as:

pvecm rejoin <leader-ip> --force

to reset a node’s cluster state and re-sync it with the leader without touching internal pmxcfs or corosync files.

This would allow admins to re-add nodes without losing VM configurations or reinstalling the entire host. A feature a mature hypervisor cluster system should really provide. In fact, the lack of this feature might be a reason to deaviate to Nutanix or Vmware.

Suggestion:
Introduce a “stateless rejoin” or “force-rejoin” mechanism in pvecm that clears local cluster metadata but preserves /etc/pve/nodes/<hostname>/qemu-server/.

Thanks for considering this — it’s one of the few parts of Proxmox that still feels fragile compared to how rock-solid the rest of the platform is.

dietmar · 2025-10-23T19:32:22+0200

Sorry, but a re-join operation make zero sense for me. Do not remove the not in the first place.

PwrBank · 2025-10-23T19:43:52+0200

What are you trying to do when removing it from the cluster? What's the point of that?

Search

Search

Re-joining a previously removed Proxmox node is overly complex due to pmxcfs design limitations

Pissed_off

New Member

dietmar

Proxmox Staff Member

PwrBank

Active Member

We value your privacy