Accidentally deleted /etc/pve/nodes in main node in cluster how to recover

feradz

New Member
May 21, 2024
2
3
3
Hi all,

I have a cluster of 3 nodes: node1, node2, node3
Accidentally I run `rm -rf /etc/pve/nodes` on the primary node.

After that I cannot login through the the web console into node1.

I can ssh to the three nodes.

None of the CTs are visible right now.

I don't have backup of /etc/pve/nodes

How can I recover the cluster node?
 
Last edited:
@feradz Do you have anything useful in the /var/lib/pve-cluster/backup directory on any of your nodes?

For example, this is on one of my nodes and seems to be automatically setup/created:

Bash:
# ls -la /var/lib/pve-cluster/backup/
total 36
drwxr-xr-x 2 root root     3 May 16 13:07 .
drwxr-xr-x 3 root root     7 May 21 23:18 ..
-rw-r--r-- 1 root root 14473 May 16 13:07 config-1715864850.sql.gz

That file looks like it's a compressed dump of the SQLite commands needed to recreate the cluster configuration:

Bash:
# zmore /var/lib/pve-cluster/backup/config-1715864850.sql.gz
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE tree (  inode INTEGER PRIMARY KEY NOT NULL,  parent INTEGER NOT NULL CHECK(typeof(parent)=='integer'),  version INTEGER NOT NULL CHECK(typeof(version)=='integer'),  write
r INTEGER NOT NULL CHECK(typeof(writer)=='integer'),  mtime INTEGER NOT NULL CHECK(typeof(mtime)=='integer'),  type INTEGER NOT NULL CHECK(typeof(type)=='integer'),  name TEXT NOT NUL
L,  data BLOB);
INSERT INTO tree VALUES(0,0,1392,0,1715864846,8,'__version__',NULL);
INSERT INTO tree VALUES(2,0,3,0,1715862102,8,'datacenter.cfg',X'6b6579626f6172643a20656e2d75730a');
(etc)

So if you have a file in the backup directory, there's a chance it might be from before you nuked the /etc/pve/nodes directory. If that's the case, then it shouldn't be tooooo hard to recover.
 
Last edited:
Thanks for the feedback.
The cluster configuration was total mess. I have installed the proxmox from the scratch and manually configured the LXCs. It was easy but until I understand that it is easy I had setup a test environment and was breaking it in different ways and trying to restore until I realized what how to recover the existing LXC subvolumes.

After this experience I have learnt that this SQLLite mapped file system can create big mess. I came across many users falling into this problem after update/upgrade of proxmox.

I have learnt that is necessary to setup backups to easily recover VMs/LXCs.

I want to thank to the professional support from Proxmox, which they have guided me how to proceed.

Ironically, I created this mess accidentally while trying to add a backup server to make proper backups :)
 
Hi all, litsen carefully.

If this happens to you too, DO NOT restart any node!

Follow theses steps to recover your cluster (Copy and Paste):
  1. cp /var/lib/pve-cluster/config.db /var/lib/clusterconfig.db
  2. systemctl stop pve-cluster.service & systemctl stop corosync.service
  3. cp /var/lib/clusterconfig.db /var/lib/pve-cluster/config.db
  4. systemctl start pve-cluster.service & systemctl start corosync.service
This helped me, I hope it will help you too. :)
 
Hi all, litsen carefully.

If this happens to you too, DO NOT restart any node!

Follow theses steps to recover your cluster (Copy and Paste):
  1. cp /var/lib/pve-cluster/config.db /var/lib/clusterconfig.db
  2. systemctl stop pve-cluster.service & systemctl stop corosync.service
  3. cp /var/lib/clusterconfig.db /var/lib/pve-cluster/config.db
  4. systemctl start pve-cluster.service & systemctl start corosync.service
This helped me, I hope it will help you too. :)

This looks wrong, first you copy out the config.db (the SQL database backing up the /etc/pve fileystem) AFTER you already deleted the files from it and while everything is running, then you stop services and ... copy it back, then start them again?

I can only imagine it works because your delete was not checkpointed from the write-ahead-log to the base, very hacky.

Actually, you should keep proper backups, e.g.:
https://forum.proxmox.com/threads/backup-cluster-config-pmxcfs-etc-pve.154569/
 
  • Like
Reactions: justinclift
@feradz Do you have anything useful in the /var/lib/pve-cluster/backup directory on any of your nodes?

For example, this is on one of my nodes and seems to be automatically setup/created:

Bash:
# ls -la /var/lib/pve-cluster/backup/
total 36
drwxr-xr-x 2 root root     3 May 16 13:07 .
drwxr-xr-x 3 root root     7 May 21 23:18 ..
-rw-r--r-- 1 root root 14473 May 16 13:07 config-1715864850.sql.gz

This is normally created before joining a cluster, it's not automated/regular, unfortunately.
 
Hi all, litsen carefully.

If this happens to you too, DO NOT restart any node!

Follow theses steps to recover your cluster (Copy and Paste):
  1. cp /var/lib/pve-cluster/config.db /var/lib/clusterconfig.db
  2. systemctl stop pve-cluster.service & systemctl stop corosync.service
  3. cp /var/lib/clusterconfig.db /var/lib/pve-cluster/config.db
  4. systemctl start pve-cluster.service & systemctl start corosync.service
This helped me, I hope it will help you too. :)

Thank you very much! it works!