[SOLVED] Accidentally ran `rm -rf /etc/pve/nodes `

pyhmn · Aug 23, 2024

As the title says, I was trying to resolve an issue with nodes still appearing in the GUI after running the delnode command. I accidentally put a space instead of a forward slash between "nodes" and the name of the node.

As you can guess, my nodes folder no longer exists.

Help, please!

pyhmn · Aug 23, 2024

I found some other related forum posts. Looking into those solutions now.

Still happy to hear from anyone with advice though.

Falk R. · Aug 23, 2024

Is the cluster still running?
Are VMs still running?

If so, you could be lucky and save something.
Hopefully you have backups of all VMs.

pyhmn · Aug 24, 2024

Falk R. said:
Is the cluster still running?
Are VMs still running?

If so, you could be lucky and save something.
Hopefully you have backups of all VMs.

I think the VMs are still running but just not visible in the GUI at all now. I'm not sure of the cluster status. I'm getting SSL connection errors in the GUI when trying to connect to other nodes in the cluster. I was already dealing with a little bit of SSH key-based access between cluster nodes before this happened but these SSL connection errors look different. I can't seem to access any info in the GUI for other cluster nodes aside from the one I'm currently logged into.

I only have 3-4 VMs running. None of them have backups yet. 2 of them are kind of Production so those will be sad to rebuild. The other 2 don't matter if I lost them.

I had just added these 4 'pve01, 02, 03, 04' nodes to an existing cluster and got ceph working. Migrated VMs to the new nodes on the ceph pool and started removing the old nodes. Some of the old nodes remained visible in the GUI so I was trying to remove them from the /etc/pve/nodes folder.

Falk R. · Aug 24, 2024

It is to be expected that the GUI does not work. If you look in the /etc/pve folder, do you still see VM configuration files in the qemu folder under local? If so, back up before you do anything else. The VM disks are in the Ceph pool and this is not affected. You will have to rebuild the cluster and therefore back up the VM configurations, as these may be lost in the process.

Kingneutron · Aug 24, 2024

PROTIP:

1) Have backups

2) Use Midnight Commander for safe(r) deleting - rm and dd are well-known system killers - see (1)

tcabernoch · Aug 24, 2024

I love this post. This is fun stuff. Let's delete the node configs and see if we can recover. And in fact, a recovery (mostly) did happen. Hmm.

pyhmn · Aug 27, 2024

I'm finally able to work on this again today.

I'm down to ssh and local console only at this point. The local GUI for each node won't even load now.

local and qemu-server folders are just sym links to the nodes folder that's gone now.

I haven't done anything with the old node hardware beyond powering it off and disconnecting the network cables.

Could there be files on one of the deleted nodes servers, e.g. /etc/pve/nodes, that I could potentially use to copy the files from?

pyhmn · Aug 27, 2024

Kingneutron said:
PROTIP:

1) Have backups

2) Use Midnight Commander for safe(r) deleting - rm and dd are well-known system killers - see (1)

Thanks, I'll check out Midnight Commander.

Yeah, backups was next on the list. Had PBS backups connected from the old nodes but it broke in the process of adding these new ceph nodes and I just hadn't gotten it working again yet before this happened. I got caught moving too quickly, skipping steps.

Falk R. · Aug 27, 2024

pyhmn said:
I'm finally able to work on this again today.

I'm down to ssh and local console only at this point. The local GUI for each node won't even load now.

local and qemu-server folders are just sym links to the nodes folder that's gone now.

I haven't done anything with the old node hardware beyond powering it off and disconnecting the network cables.

Could there be files on one of the deleted nodes servers, e.g. /etc/pve/nodes, that I could potentially use to copy the files from?

Unfortunately not, the folder /etc/pve is a cluster file system which is stored in a database. Everything that is written and deleted there is replicated immediately.
If nodes are removed, this cluster configuration is also removed.

If you have a switched off server that was still a member of the cluster, then there is some data left.

pyhmn · Aug 27, 2024

Ok, so I think it's fixed!

I had a node server that had been powered off before running the delnode command on the other remaining nodes. It still had the full /etc/pve/nodes folder intact. While it remained off the network (to avoid letting it attempt to reconnect to the cluster that already expects it to be deleted), I copied the /etc/pve/nodes folder to a local USB drive, removed the folders for all the deleted nodes, and put the nodes folder back in place on one of the clustered nodes. It automatically synced with the other nodes and the GUI came back up immediately for all clustered nodes!

I was very fortunate to have that /etc/pve/nodes folder on the old node.

Search

Search

[SOLVED] Accidentally ran `rm -rf /etc/pve/nodes `

pyhmn

New Member

pyhmn

New Member

Falk R.

Distinguished Member

pyhmn

New Member

Attachments

Falk R.

Distinguished Member

Kingneutron

Active Member

tcabernoch

Active Member

pyhmn

New Member

pyhmn

New Member

Falk R.

Distinguished Member

pyhmn

New Member