Can't get in or out of the cluster

Proximate

Member
Feb 13, 2022
219
13
23
65
I have a 7.1.7 cluster where I needed to remove a node and did it from the command line.
After that, I could not log into the nodes GUI anymore, only from the command line.
I tried using the local authentication and it would not let me in but after a while, it suddenly did.
From another node, looking at the joined nodes, I see all but this node.
Looking from the GUI of this node, I see the other nodes but this node is not part of the cluster.

Now I'm stuck as the only cluster option I have is 'join information' and not join and I read that you cannot or should not join a node from the command line.

From one of the other nodes, I don't see the removed one;
pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pro02
2 1 pro03
4 1 pro01
5 1 pro06 (local)
6 1 pro05

From the node that was removed, it seems to still think it's part of the cluster.

pvecm nodes

Membership information
----------------------
Nodeid Votes Name
3 1 pro04 (local)

From any of the joined nodes, in the gui, I see all of the joined nodes and the one that was removed in the Datacenter list but in the Cluster nodes, I don't see the one that was removed.
On the lone node GUI, I can still see the other nodes and this node sees itself in the nodes list.

And of course, I cannot remove it/myself from the cluster;
pvecm delnode pro04
Cannot delete myself from cluster!

It is somehow stuck.
Cannot even change the status of any vm on it while directly connected to its gui.

unable to open file '/etc/pve/nodes/pro04/qemu-server/152.conf.tmp.2545' - Permission denied (500)


What in the world do I do here?
 
Last edited:
Hi,

I was in a panic of having to move vms due to failing hardware and didn't follow the correct method.
Some of the nodes are no longer here as the servers died.
There is only one node that is important, the rest I don't care about.
The cluster seems to be ok but I need to re-add this node since it has three vms on it.
The vms won't come up so I cannot back them up so I can rebuild the node.
 
Last edited:
If your broken node is still up you can use "vzdump [vm-id]" to backup the vm on the broken system. (see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_examples_11)

you can restore it on another server in the cluster via this
Code:
pct restore 555 [/file/location/vm.tar]

The other thing todo is to edit the corosync config and remove the broken node by hand.
 
Last edited:
  • Like
Reactions: Proximate
Simple solution indeed. Thank you very much. Dumped, moved, rebuilt host then re-imported after joining cluster.
The only problem is that it still shows in the cluster but it's not in the cluster.
I've not been able to remove it from any cluster node so what can be done? I'd like to keep the same name to keep track of everything.
 
Last edited:
I hope I understand it correctly:
You want to remove the broken node from cluster. It currently looks something like this (with pve-cl1) being the broken node.

Screenshot from 2022-05-10 08-52-51.png

Then the only thing you need todo is go to the command line on one of the working nodes in the cluster and remove the node folder in:
Bash:
cd /etc/pve/nodes
root@pve-cl2:/etc/pve/nodes# ls
pve-cl1  pve-cl2  pve-cl3
rm -r pve-cl1

It should then be removed on the webgui as well (if you are not patient a refresh of the web page should show that its is removed).
 
  • Like
Reactions: Proximate
Thank you, I never came across these commands while looking.
I dumped the vms and rebuild but the vms remain in the list though will not start.