dbrd 2 server cluster master/slave failes?

nate

Member
Mar 15, 2011
63
0
6
I was really happy to get dbrd working in a 2 server cluster configeration, got all my vms migrated, tested a live migration. Works great so far. I used the guide on the wiki and I'm set up exactly the same way.

So my question is. What happens when the master/slave server crashes? What s the procedure to get the slave running. The vm should be already on the dbrd synced drive. Whats the proper way to failover? I haven't unplugged one yet to try it out, I don't want to ruin my work so far...

I searched the forum but couldn't find this issue as it pertains to dbrd.

Does anyone have any experience with this and is there any documentation anywhere on what to do. I'm deploying a production system and I want to know what I'm doing before I roll it out.

Thanks for the help!
 
I was really happy to get dbrd working in a 2 server cluster configeration, got all my vms migrated, tested a live migration. Works great so far. I used the guide on the wiki and I'm set up exactly the same way.

So my question is. What happens when the master/slave server crashes? What s the procedure to get the slave running. The vm should be already on the dbrd synced drive. Whats the proper way to failover? I haven't unplugged one yet to try it out, I don't want to ruin my work so far...

I searched the forum but couldn't find this issue as it pertains to dbrd.

Does anyone have any experience with this and is there any documentation anywhere on what to do. I'm deploying a production system and I want to know what I'm doing before I roll it out.

Thanks for the help!
Hi,
with pve1.x you need a copy of the vm-config on the other node. E.g. rsync once a day the directory /etc/qemu-server from node-a to /etc/node-a_qemu-server on node-b and vice versa.
So if one node fails - you need only to mv/cp the configs to /etc/qemu-server and start the vm!

BTW: i prefer two drbd-devices - one for each server as main-storage, so you don't get in trouble if you have a split-brain situation! Because with only on drbd-device you can't overwrite one side easy (on both sides are active clients).

Udo
 
Last edited:
Hi,
with pve1.x you need a copy of the vm-config on the other node. E.g. rsync once a day the directory /etc/qemu-server from node-a to /etc/node-a_qemu-server on node-b and vice versa.
So if one node fails - you need only to mv/cp the configs to /etc/qemu-server and start the vm!

I think I understand this. That's real good advice. Do you just set up a cron job on the pve instance? How is this best implemented?


BTW: i prefer two drbd-devices - one for each server as main-storage, so you don't get in trouble if you have a split-brain situation! Because with only on drbd-device you can't overwrite one side easy (on both sides are active clients).

I read this in the guide too. But it’s hard for me to get my head around it. Is this right in my understanding?

If I had 2 vms. vm1 on server 1 and vm2 on server 2.

Say server2 dies. I loose all of server 2. VM1 is running on server 1, vm2 is split. Both on that dbrd volume. wouldn’t you just roll with server1 bdrd volume?

But I think what you are saying is when server2 comes back up, and you want the freshest data before the crash, you can’t get it. Because the servers share the dbrd volume and you would have to pick one or the other, losing the most recent state of one of them.

But when you separate them now you can pick which one to use.
 
I think I understand this. That's real good advice. Do you just set up a cron job on the pve instance? How is this best implemented?
yes, i use simply cron for that - for me is once a day enough, the configs normaly don't change.
I read this in the guide too. But it’s hard for me to get my head around it. Is this right in my understanding?

If I had 2 vms. vm1 on server 1 and vm2 on server 2.

Say server2 dies. I loose all of server 2. VM1 is running on server 1, vm2 is split. Both on that dbrd volume. wouldn’t you just roll with server1 bdrd volume?
No there are two different things.
1. Node 2 crasht (no split) - than you can copy the config from vm2 to /etc/qemu-server/ and start vm2. If you repair node2 you can start them, drbd knows that the device is out of sync and resync (vm2 must disable autostart!!). If the resync is finish you delete the config of vm2 on node2 (because they run on node1) and do a live-migration.
All fine!
2. you get a split-brain - something wrong with the drbd-connection (network,driver,whatever). Both nodes running as standalone and both VMs running but your drbd-device isn't in sync.
Now you can't overwrite one node, because then you lost your actual data of the running vm. Solution: shutdown and backup one vm. Overwrite this drbd-device with the contend of the other node. Restore the backup and start the vm. Takes a long time and is very bad!
But I think what you are saying is when server2 comes back up, and you want the freshest data before the crash, you can’t get it. Because the servers share the dbrd volume and you would have to pick one or the other, losing the most recent state of one of them.

But when you separate them now you can pick which one to use.
right - i case 2 i simply overwrite drbd-r0 on node2 (because the vm1 on node1 use this storage) and overwrite drbd-r1 on node1. Both VMs can running and after that i have a synced system without downtime.
And if you use number 100-199 on node 1 and 200-299 on node2 you can see fast if all VMs running on the right node.

Udo