Hi all,
I'm evaluating proxmox for use in an production environment.
I’ve run into a recurring problem that highlights a design limitation in Proxmox VE’s cluster system (pmxcfs).
When a node is removed from a cluster and later needs to re-join, the process consistently fails with errors like:
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
* corosync is already running, is this node already in a cluster?!
Even after stopping all cluster services (pve-cluster, corosync, pmxcfs), unmounting /etc/pve, and cleaning /var/lib/pve-cluster, Proxmox refuses to allow the node to re-join.
It appears that pmxcfs and its database (config.db) keep stale cluster metadata which prevents a clean re-association with the leader node.
This forces administrators to manually remove low-level files, restart daemons, and rebuild the cluster filesystem — something that should be a supported and automated process.
Environment:
Expected behavior:
There should be a supported, safe command such as:
pvecm rejoin <leader-ip> --force
to reset a node’s cluster state and re-sync it with the leader without touching internal pmxcfs or corosync files.
This would allow admins to re-add nodes without losing VM configurations or reinstalling the entire host. A feature a mature hypervisor cluster system should really provide. In fact, the lack of this feature might be a reason to deaviate to Nutanix or Vmware.
Suggestion:
Introduce a “stateless rejoin” or “force-rejoin” mechanism in pvecm that clears local cluster metadata but preserves /etc/pve/nodes/<hostname>/qemu-server/.
Thanks for considering this — it’s one of the few parts of Proxmox that still feels fragile compared to how rock-solid the rest of the platform is.
I'm evaluating proxmox for use in an production environment.
I’ve run into a recurring problem that highlights a design limitation in Proxmox VE’s cluster system (pmxcfs).
When a node is removed from a cluster and later needs to re-join, the process consistently fails with errors like:
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
* corosync is already running, is this node already in a cluster?!
Even after stopping all cluster services (pve-cluster, corosync, pmxcfs), unmounting /etc/pve, and cleaning /var/lib/pve-cluster, Proxmox refuses to allow the node to re-join.
It appears that pmxcfs and its database (config.db) keep stale cluster metadata which prevents a clean re-association with the leader node.
This forces administrators to manually remove low-level files, restart daemons, and rebuild the cluster filesystem — something that should be a supported and automated process.
Environment:
- Proxmox VE 9.x (same issue existed on 8.x)
- Two-node cluster + QDevice
- Both nodes otherwise healthy and reachable
Expected behavior:
There should be a supported, safe command such as:
pvecm rejoin <leader-ip> --force
to reset a node’s cluster state and re-sync it with the leader without touching internal pmxcfs or corosync files.
This would allow admins to re-add nodes without losing VM configurations or reinstalling the entire host. A feature a mature hypervisor cluster system should really provide. In fact, the lack of this feature might be a reason to deaviate to Nutanix or Vmware.
Suggestion:
Introduce a “stateless rejoin” or “force-rejoin” mechanism in pvecm that clears local cluster metadata but preserves /etc/pve/nodes/<hostname>/qemu-server/.
Thanks for considering this — it’s one of the few parts of Proxmox that still feels fragile compared to how rock-solid the rest of the platform is.
Last edited: