Insanely great recovery of Proxmox node

mlanner · Mar 14, 2012

I had a "close call" today in my small, but important to me, testing Proxmox cluster and just wanted to post the outcome as a testament to Proxmox, Linux and a bunch of the open source technologies behind the scenes. So, here we go:

Earlier this evening I had a sudden power outage. Since I have limited battery power to run my servers and because I didn't want to risk getting data corrupted by a sudden power loss in my VMs, I quickly SSHd to my two Proxmox virtual hosts and issued a few qm shutdown commands to shut down my VMs. (Thanks to Dietmar for the clarification of qm commands and what they do a few weeks back.) I then proceeded to cleanly shut down my virtual hosts.

After a somewhat long outage, the power finally came back. I powered on my virtual hosts again just to find that one server's power supply had died. (Yes, I know, I should have dual power supplies. But this is a small, testing, budget cluster.) Now what? Well, I don't really have an identical server, so I thought to myself that this could be a long night restoring VMs from snapshots and then from backups. Hmm ... that could take some time. Oh well, I was glad I had some backups of the most important data, anyhow.

Just to see if I could take a shortcut, I took my two drives from the machine with the dead power supply and moved them to a different machine, with completely different motherboard, different RAM and different NICs. The only thing that was somewhat the same was the fact that both machines had Intel CPUs and similar RAID cards. The CPUs were different Intel CPUs and the RAID cards were 3ware, but again, different models. I then continued by powering on the machine and completely amazed looked at the screen as it booted past everything all the way to the login screen. No waaaay, I thought to myself. Impossible!? I ran a ping test to my master node. Nope, no go. I checked the network card and IP config of the machine. Ah! eth0 was not recognized, but eth1, eth2, eth3 are all there. OK. I changed network config by replacing eth0 in the config to eth1. Bada-bing-bada-boom! Pinging my other cluster node now worked. This was too good to be true. I logged in to my master node to see if both nodes were there. Well, yeah, but the second virtual host is not syncing ... at least not yet. I waited for a few seconds and then refreshed the page. What? Really? It's syncing. No waaaay! Again. I fired up a VM. It just booted up like nothing had happened. Unbelievable. It just worked.

Needless to say, I'll keep an eye on my "new" second node for a few days and runs some tests to see if everything seems to run OK and smoothly.

Anyways, I just wanted to share my little "close call" story.

udo · Mar 14, 2012

Hi,
if you edit /etc/udev/rules.d/70-persistent-nic.rules to have the "right" eth0 back and do an reboot you don't need to change /etc/network/interfaces.

You can boot the system on different hardware - that's one big advantage of linux.

Udo

Insanely great recovery of Proxmox node

mlanner

Renowned Member

udo

Distinguished Member

We value your privacy