[SOLVED] Accidentally ran `rm -rf /etc/pve/nodes `

pyhmn

New Member
Oct 20, 2023
6
1
3
As the title says, I was trying to resolve an issue with nodes still appearing in the GUI after running the delnode command. I accidentally put a space instead of a forward slash between "nodes" and the name of the node.

As you can guess, my nodes folder no longer exists. :eek:

Help, please!
 
I found some other related forum posts. Looking into those solutions now.

Still happy to hear from anyone with advice though.
 
Is the cluster still running?
Are VMs still running?

If so, you could be lucky and save something.
Hopefully you have backups of all VMs.
 
Is the cluster still running?
Are VMs still running?

If so, you could be lucky and save something.
Hopefully you have backups of all VMs.
I think the VMs are still running but just not visible in the GUI at all now. I'm not sure of the cluster status. I'm getting SSL connection errors in the GUI when trying to connect to other nodes in the cluster. I was already dealing with a little bit of SSH key-based access between cluster nodes before this happened but these SSL connection errors look different. I can't seem to access any info in the GUI for other cluster nodes aside from the one I'm currently logged into.

I only have 3-4 VMs running. None of them have backups yet. 2 of them are kind of Production so those will be sad to rebuild. The other 2 don't matter if I lost them.

I had just added these 4 'pve01, 02, 03, 04' nodes to an existing cluster and got ceph working. Migrated VMs to the new nodes on the ceph pool and started removing the old nodes. Some of the old nodes remained visible in the GUI so I was trying to remove them from the /etc/pve/nodes folder.
 

Attachments

  • vm-not-visible.png
    vm-not-visible.png
    74.4 KB · Views: 3
  • ssl-connection-error.png
    ssl-connection-error.png
    12.2 KB · Views: 3
It is to be expected that the GUI does not work. If you look in the /etc/pve folder, do you still see VM configuration files in the qemu folder under local? If so, back up before you do anything else. The VM disks are in the Ceph pool and this is not affected. You will have to rebuild the cluster and therefore back up the VM configurations, as these may be lost in the process.
 
I love this post. This is fun stuff. Let's delete the node configs and see if we can recover. And in fact, a recovery (mostly) did happen. Hmm.
 
I'm finally able to work on this again today.

I'm down to ssh and local console only at this point. The local GUI for each node won't even load now.

local and qemu-server folders are just sym links to the nodes folder that's gone now.

I haven't done anything with the old node hardware beyond powering it off and disconnecting the network cables.

Could there be files on one of the deleted nodes servers, e.g. /etc/pve/nodes, that I could potentially use to copy the files from?
 
PROTIP:

1) Have backups

2) Use Midnight Commander for safe(r) deleting - rm and dd are well-known system killers - see (1)
Thanks, I'll check out Midnight Commander.

Yeah, backups was next on the list. Had PBS backups connected from the old nodes but it broke in the process of adding these new ceph nodes and I just hadn't gotten it working again yet before this happened. I got caught moving too quickly, skipping steps.
 
I'm finally able to work on this again today.

I'm down to ssh and local console only at this point. The local GUI for each node won't even load now.

local and qemu-server folders are just sym links to the nodes folder that's gone now.

I haven't done anything with the old node hardware beyond powering it off and disconnecting the network cables.

Could there be files on one of the deleted nodes servers, e.g. /etc/pve/nodes, that I could potentially use to copy the files from?
Unfortunately not, the folder /etc/pve is a cluster file system which is stored in a database. Everything that is written and deleted there is replicated immediately.
If nodes are removed, this cluster configuration is also removed.

If you have a switched off server that was still a member of the cluster, then there is some data left.
 
Ok, so I think it's fixed!

I had a node server that had been powered off before running the delnode command on the other remaining nodes. It still had the full /etc/pve/nodes folder intact. While it remained off the network (to avoid letting it attempt to reconnect to the cluster that already expects it to be deleted), I copied the /etc/pve/nodes folder to a local USB drive, removed the folders for all the deleted nodes, and put the nodes folder back in place on one of the clustered nodes. It automatically synced with the other nodes and the GUI came back up immediately for all clustered nodes!

I was very fortunate to have that /etc/pve/nodes folder on the old node.
 
  • Like
Reactions: Falk R.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!