Reinstalling cluster node

gkovacs

Renowned Member
Dec 22, 2008
514
51
93
Budapest, Hungary
Ok so I have 2 nodes, proxmox and proxmox2.
There was a working cluster, but I decided to reinstall proxmox2, so:

1. I migrated all guests from proxmox2 to proxmox

2. I deleted proxmox2 with "pvecm delnode proxmox2" on proxmox but it has not disappeared
Code:
root@proxmox:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2013-04-25 20:01:15  proxmox
   2   X     20                        proxmox2

root@proxmox:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: CLUSTER
Cluster Id: 30647
Cluster Member: Yes
Cluster Generation: 24
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: proxmox
Node ID: 1
Multicast addresses: X.X.X.X
Node addresses: X.X.X.X

3. I reinstalled proxmox2

4. When joining the new proxmox2 to the cluster, it shows the following error
Code:
root@proxmox2:~# pvecm add proxmox
root@proxmox's password:
unable to copy ssh ID

Now the web interface is not working, I cannot save backup jobs anymore on proxmox.

Then I tried to restart the services (pvestatd, pvedaemon, cman, pve-cluster) on proxmox:

Code:
root@proxmox:~# service pve-cluster start
Starting pve cluster filesystem : pve-clusterfuse: failed to access mountpoint /etc/pve: Transport endpoint is not connected
[main] crit: fuse_mount error: Transport endpoint is not connected
[main] notice: exit proxmox configuration filesystem (-1)
 (warning).

Now I can't even log in to proxmox web interface.
/etc/pve seems to have disappeared entirely

This is a production server, so I can't reinstall it, it's running many guests.
How can I restore a working cluster or at least get the containers off of it?
 
Last edited:
Hi,
I had similar issue. It seems like this issue occurs when trying to re-install a node with same hostname or same IP address after taking it down offline and removing from cluster. The following command fixed it for me. Run it from the node you are trying to add. In your case proxmox2
root@proxmox:~# pvecm add <master-ip> --force

Hope this helps.
 
Thanks for the tip.

Turns out pmxcfs was stopped on our proxmox node, that's why the login to the web interface did not work - the entire /etc/pve folder disappeared and not even command line tools (vzdump and vzctl) did anything.

After fixing that, we also had to manually force the reinstalled proxmox2 back into the cluster by removing and recreating the ssh key (we did not know about the --force parameter at the time), which also took some investigating.

In my opinion the cluster implementation of PVE needs to be much more robust, a lot of things can break it at the moment and it's only possible to fix if you know exactly what you are doing.

Also there should be a documented way of completely removing a cluster configuration from a node and returning it to a clean (post-installation) state. The dev's suggestion of reinstallation is not acceptable to many of us running production servers, since it's entirely possible that you still have running containers on a broken cluster node - like it happened to us today.
 
Last edited:
The dev's suggestion of reinstallation is not acceptable to many of us running production servers, since it's entirely possible that you still have running containers on a broken cluster node - like it happened to us today.

If there are VM on a node, it is always possible to fix the cluster. There is no need to remove or re-install something.
 
If there are VM on a node, it is always possible to fix the cluster. There is no need to remove or re-install something.

Well it was you who advocated reinstall in another thread about our recent cluster nightmare:
http://forum.proxmox.com/threads/13626-cluster-creation-FAILED?p=73421#post73421

Clearly, your cluster software could use a little bit more documentation, but the very least some dire warnings what not to do...

- never delete a cluster node from another node when it's still online (it might not get deleted, what is status X? also reboot maybe needed after node deletion?)
- if you can't delete a cluster node by name, don't try to delete it by node id (i'm not sure what it does, but nothing good apparently)
- never restart the cluster services after failing to delete a cluster node (pmxcfs might not restart, so your working node's configuration will disappear and you will panic as even login won't work)
- never try to add a reinstalled cluster node with the same ip and hostname, it won't work (possibly with the -force parameter it would, but again it's not documented in the wiki)

Let me repeat: I think (at this stage of robustness of the cluster software) there should be a documented way of completely removing the cluster configuration from a node, and returning it to a post-installation state, so you could create a new cluster easily and save or backup your VM / CT data. You could do this easily with 1.9, so when your cluster got messed up, you would just start from scratch.
 
Last edited:
Again, this entire thread was about the situation when you DO HAVE VM's on a node, and the cluster stops working.

You wrote: "I migrated all guests from proxmox2 to proxmox".
So I assume there are no VMs on that node.
 
You wrote: "I migrated all guests from proxmox2 to proxmox".
So I assume there are no VMs on that node.

Well, it was the other server (proxmox) that was running all VM's when the cluster has collapsed. Please read what happened (post #1) and also what in our experience can seriously mess up a cluster (post #5).

Based on what happened to us I still think that step-by-step instructions to remove a cluster and reset a node to a post-installation state are needed...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!