DRBD cluster - handling node failures

00aet

New Member
Jan 9, 2012
3
0
1
So I've got a 2 node DRBD cluster setup, all is working well. Migrating VMs between nodes works without trouble. Anyway, I'm working on documentation for managing all of this and the question has come to mind of what exactly needs done in the instance either the master or slave nodes fail. I've been hesitant to test that process so far. In the case of a master node failure, the cluster is no longer managable, correct? What needs done to move the VMs to the host that is up in the case of a node failure?

thanks
 
With 1.X, you need to setup some process to backup/copy(to some temporary location) the VM config files.
On 2.0 you do not need to worry about this.

When a node fails simply copy the vm config files into /etc/qemu-server/ and start up the VMs.
We setup a process in cron to copy all the VM config files to the other node into a special folder so we are always prepared for a node failure.

In 1.X if the master node fails, you need to make the slave node the master to manage things.

Did you setup two DRBD volumes to make is easy to recover from split-brain?
http://pve.proxmox.com/wiki/DRBD#Recovery_from_communication_failure
 
I debated using 2.0 but decided against it for production right now. Seems stable enough but best to leave things alone :)

Simple enough I guess, I have set up a quick cron to copy the files between nodes. Is promoting the slave node as simple as a pveca -m or will it want to attempt to contact the current master node doing that?

I did setup two volumes. Do you run any periodic verifies or anything on your drbd volumes?
 
I debated using 2.0 but decided against it for production right now. Seems stable enough but best to leave things alone :)

Simple enough I guess, I have set up a quick cron to copy the files between nodes. Is promoting the slave node as simple as a pveca -m or will it want to attempt to contact the current master node doing that?
"pveca -m" is enough - because normaly you can't reach the old master in such a situation.
I did setup two volumes. Do you run any periodic verifies or anything on your drbd volumes?
I don't use verify - use the monitoring software (icinga) to check that the drbd-ressources are in sync.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!