Insanely great recovery of Proxmox node

mlanner

Renowned Member
Apr 1, 2009
190
1
83
Berkeley, CA
I had a "close call" today in my small, but important to me, testing Proxmox cluster and just wanted to post the outcome as a testament to Proxmox, Linux and a bunch of the open source technologies behind the scenes. So, here we go:

Earlier this evening I had a sudden power outage. Since I have limited battery power to run my servers and because I didn't want to risk getting data corrupted by a sudden power loss in my VMs, I quickly SSHd to my two Proxmox virtual hosts and issued a few qm shutdown commands to shut down my VMs. (Thanks to Dietmar for the clarification of qm commands and what they do a few weeks back.) I then proceeded to cleanly shut down my virtual hosts.

After a somewhat long outage, the power finally came back. I powered on my virtual hosts again just to find that one server's power supply had died. (Yes, I know, I should have dual power supplies. But this is a small, testing, budget cluster.) Now what? Well, I don't really have an identical server, so I thought to myself that this could be a long night restoring VMs from snapshots and then from backups. Hmm ... that could take some time. Oh well, I was glad I had some backups of the most important data, anyhow.

Just to see if I could take a shortcut, I took my two drives from the machine with the dead power supply and moved them to a different machine, with completely different motherboard, different RAM and different NICs. The only thing that was somewhat the same was the fact that both machines had Intel CPUs and similar RAID cards. The CPUs were different Intel CPUs and the RAID cards were 3ware, but again, different models. I then continued by powering on the machine and completely amazed looked at the screen as it booted past everything all the way to the login screen. No waaaay, I thought to myself. Impossible!? I ran a ping test to my master node. Nope, no go. I checked the network card and IP config of the machine. Ah! eth0 was not recognized, but eth1, eth2, eth3 are all there. OK. I changed network config by replacing eth0 in the config to eth1. Bada-bing-bada-boom! Pinging my other cluster node now worked. This was too good to be true. I logged in to my master node to see if both nodes were there. Well, yeah, but the second virtual host is not syncing ... at least not yet. I waited for a few seconds and then refreshed the page. What? Really? It's syncing. No waaaay! Again. I fired up a VM. It just booted up like nothing had happened. Unbelievable. It just worked.

Needless to say, I'll keep an eye on my "new" second node for a few days and runs some tests to see if everything seems to run OK and smoothly.

Anyways, I just wanted to share my little "close call" story.
 
Hi,
if you edit /etc/udev/rules.d/70-persistent-nic.rules to have the "right" eth0 back and do an reboot you don't need to change /etc/network/interfaces.

You can boot the system on different hardware - that's one big advantage of linux.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!