DRBD cluster - handling node failures

00aet · Mar 1, 2012

So I've got a 2 node DRBD cluster setup, all is working well. Migrating VMs between nodes works without trouble. Anyway, I'm working on documentation for managing all of this and the question has come to mind of what exactly needs done in the instance either the master or slave nodes fail. I've been hesitant to test that process so far. In the case of a master node failure, the cluster is no longer managable, correct? What needs done to move the VMs to the host that is up in the case of a node failure?

thanks

e100 · Mar 2, 2012

With 1.X, you need to setup some process to backup/copy(to some temporary location) the VM config files.
On 2.0 you do not need to worry about this.

When a node fails simply copy the vm config files into /etc/qemu-server/ and start up the VMs.
We setup a process in cron to copy all the VM config files to the other node into a special folder so we are always prepared for a node failure.

In 1.X if the master node fails, you need to make the slave node the master to manage things.

Did you setup two DRBD volumes to make is easy to recover from split-brain?
http://pve.proxmox.com/wiki/DRBD#Recovery_from_communication_failure

00aet · Mar 2, 2012

I debated using 2.0 but decided against it for production right now. Seems stable enough but best to leave things alone

Simple enough I guess, I have set up a quick cron to copy the files between nodes. Is promoting the slave node as simple as a pveca -m or will it want to attempt to contact the current master node doing that?

I did setup two volumes. Do you run any periodic verifies or anything on your drbd volumes?

udo · Mar 3, 2012

00aet said:
I debated using 2.0 but decided against it for production right now. Seems stable enough but best to leave things alone

Simple enough I guess, I have set up a quick cron to copy the files between nodes. Is promoting the slave node as simple as a pveca -m or will it want to attempt to contact the current master node doing that?

"pveca -m" is enough - because normaly you can't reach the old master in such a situation.

I did setup two volumes. Do you run any periodic verifies or anything on your drbd volumes?

I don't use verify - use the monitoring software (icinga) to check that the drbd-ressources are in sync.

Udo

Search

Search

DRBD cluster - handling node failures

00aet

New Member

e100

Renowned Member

00aet

New Member

udo

Distinguished Member