Recovery after accidentally deleting /etc on one of the HN-s in a cluster

pero

Renowned Member
Apr 2, 2012
11
1
68
Hungary
Hi,


I've got a 2 node cluster with proxmox 2.1 running 10+ CT-s.
Let's call this 2 nodes: HN1 and HN2


I did a very very bad thing: On HN1 I wanted to delete "etc" directory in the /root forder. So in that directory I wanted to execute: "rm etc -r". But accidentally I executed "rm /etc -r"
So on HN1 I deleted the whole /etc directory. It did not affect the host server and all CT on it. They functioned correctly. But because of the missing config files, everything was out of control.
I could not log into the proxmox web interface, I could not do any backups, because vzdump didn't know about storages and didn't know the IDs of the running CT-s.... etc,etc


So it was a state after airplane's engine stops but before crashing into the ground...


It was no option for me to do a full proxmox reinstall and then restore from the at least 12 hours old CT backups, because almost all CT are in production. I did not want to stop them.


So, I decided to try to restore/recreate the /etc dir.


Finally I managed to do this, now I think everything is working correctly. One of the following nights I want to reboot both nodes, but I'm scared, that maybe I'm still missing something and my nodes won't start.


That's why I'm writing down what I did exactly and I'm asking you to make sure that everything will be fine after reboot :)


So I did the following:


1.
I copied the whole /etc dir from HN2 to HN1. I replaced all occurences of IP addresses and domain names. The following files were affected:
/etc/hosts
/etc/network/interfaces
/etc/hostname
/etc/postfix/main.cf




2.
Unfortunatelly files from /etc/pve (except hidden files) were missing from both nodes, because this directory is replicated on all nodes in a cluster
So I installed proxmox on a third machine to see what is in that directory. I copied these files from the test machine to the HN1 /etc/pve dir. (it also appeared in HN2 automatically)
After this I was able to login to the proxmox web interface, but it was completely empty, there was no storages and no CTs in the list.




3.
So I created a test CT and storage on the test machine to see how the CTID.conf and storage.conf files looks like. Based on these sample files I managed to create this config files on HN1 (it also appeared in HN2 automatically)
After this everything appeared on the web interface.




4.
But unfortunatelly there was problems with the cluster, it worked partially. (maybe because the changed selfsigned certs in /etc/pve? )
Because my test machine was a single node, I had no clue what else is missing or wrong. So I decided to remove then recreate the whole cluster. It worked! Cluster seems to work prefectly now.




5.
The network config was completely different in HN2 and HN1, for example HN1 had more VMBR-s defined. So with the proxmox web interface I recreated HN1 network configuration. It created /etc/network/interfaces.new with the right config. So after reboot this will be used.






So that's all I did.
Please if somebody thinks that something is still missing from the config files, let me know.


Thank you very much!


And the most important thing: I love Proxmox VE :)


pero
 
Last night I rebooted both nodes.
There was only one problem: 2 of the 4 network card's name has changed (eth0 -> eth4, eth1 -> eth5)
I could solve this issue quickly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!