cluster type: DRBD Active/Active control

RobFantini · Sep 5, 2013

Hello
Using drbd active/active storage is this possible: Make sure that the drbd resource only has kvm's running on one node?

We want all or none of the kvm's running on a node.

Is it possible to control that somehow in pve or cli?

I do not want to have this situation ever occur: Split brain when some kvm's are runing on one node and some on the other.

thanks
Rob

term · Sep 5, 2013

Like I said in the other thread - Fencing. It's in the wiki.

RobFantini · Sep 5, 2013

term said:
Like I said in the other thread - Fencing. It's in the wiki.

thank you for the reply.

I do understand and use fencing - but not to this point:

Can fencing make it so that we could have ALL or NONE of a few KVM's runing on a node? Under no circumstances do we want some running on both drbd nodes.

SO in cluster.conf is it possible to set up a HA group of KVM's ?

Or is that just the way it works?

term · Sep 5, 2013

To do what you want, I would use a APC switched power outlet as the fencing device. That will kill power to one of your nodes and reboot it. I think they call it STONITH - shoot the other node in the head. For windows clusters, we used a small quorum drive. Whoever had control of that had all the other shared drives, and the other node could not see them.

I'm no expert on this, but I think that would get you your desired result. I intend to do a similar setup, but have not had the time to test it yet. Testing is where you learn the most, its all theory until you start pulling network cables and power cables to see what happens.

RobFantini · Sep 5, 2013

term said:
To do what you want, I would use a APC switched power outlet as the fencing device. That will kill power to one of your nodes and reboot it. I think they call it STONITH - shoot the other node in the head. For windows clusters, we used a small quorum drive. Whoever had control of that had all the other shared drives, and the other node could not see them.

I'm no expert on this, but I think that would get you your desired result. I intend to do a similar setup, but have not had the time to test it yet. Testing is where you learn the most, its all theory until you start pulling network cables and power cables to see what happens.

Term - I appreciate your responses.

However there are always new ways to use drbd + pve as both are evolving from great to excellent.

Hopefully someone who is in tune with DRBD/PVE in the set up we want to use can chime in.

e100 · Sep 5, 2013

failoverdomains

http://forum.proxmox.com/threads/9423-Failover-Domains-question

e100 · Sep 5, 2013

RobFantini said:
I do not want to have this situation ever occur: Split brain when some kvm's are runing on one node and some on the other.

Make two DRBD volumes
Run all the VMs on one node on one DRBD volume and do the same for the other node using the other DRBD volume.
I have described this in detail on the wiki, complete with info on how to recover from split-brain.

Some advantages:
When split-brain happens you only need to resync 50% of your data, one DRBD volume. Rarely do both split-brain at the same time.
You can utilize your resourceses better gaining some performance when both nodes are up.

Using the failoverdomains with HA VMs you can ensure the VMs will live on the proper server by default.

All that being said, I do not use HA VMs on my DRBD volumes.
If setup properly AND working properly I can not think of any way that HA could possibly make a bad decision when a node fails and DRBD is involved.

But there is one problem, HA is not aware of the DRBD status.
With HA, running the VMs on one node like you sugested or both as I suggested this can still happen:
1. DRBD split-brains at 1AM
2. The node running the VMs, where the current data is located, fails at 6AM.
3. HA starts the VMs up on the other node, the one with the OLD data.
4. Now you have a mess to sort out, have fun with that.

Sometimes it is better to let a human make a decision on what the best thing to do is.
Maybe I can easily fix the failed node by swapping a power supply, I might prefer to do that rather than loose some data.

We need to make HA aware of DRBD status, it needs to work more like this:
1. node runnings VMs fails
2. HA looks at what the DRBD status was 5 minutes ago
a) If status was good, start VM on other node
b) if status was bad, do nothing, alert a human to make a decision

Until I can figure out how to do that, DRBD + HA is not a combination I would recommend.

RobFantini · Sep 8, 2013

Until I can figure out how to do that, DRBD + HA is not a combination I would recommend.

I knew how to do that using Etch , then the Lenny then Lenny/PVE. Using Primary/Secondary DRBD + Heartbeat was relatively easy. Last tine I checked, heartbeat was incompatible with pve .

Using PVE , cluster + DRBD , there has to be some way to get HA working. Let me know if you have any ideas.

In a few days I'll have all the parts in for our 2 new Dell servers... We will use drbd and for now plan on doing manual kvm switch over in case a node fails.

e100 · Sep 8, 2013

I've used heartbeat with DRBD for HA NFS before.
Same problem that I described above can happen with that setup too, just not as easily.
Bad situation:
1. Secondary stops replicating at 1AM
2. Primary fails at 3AM
3. Secondary promoted to Primary, using 2 hour old data.

You could, I suppose, script things to prevent that.

I am willing to collaborate on how to solve this, just not ever taken the time to investigate where to start.

RobFantini · Sep 26, 2013

Well our two Dell R720 systems are in production. They have 8 SAS disks , perc 710 cards . the raid 10 storage is split in 3 partitions, one for pve and 2 for drbd .

We have a 3-rd node .

For now if a node dies then manually the kvm's wil be moved to the surviving system.

In the future we'll explore using drbd configuration to :
1- send an email that there is a node down.
2- in the email state the commands to be done manually to migrate the kvm's

After testing a lot then run the commands from a script .

We'lll set up another cluster next month to test the above.

cesarpk · Sep 26, 2013

I have a similar problem

But in my case, the solutiion is more easy.

My Scenery:
2 PVE nodes for VM and Quorum
1 PVE node only for Quorum
Total = 3 PVE nodes

The fence is fence_manual = "necessarily with human interaction"
This manual Fence works well (for exampe if i cut the power to the node previously)
For do manual fence i have ready a script on both nodes.

But, What is missing in my fence script? ...

My Problem for have a complete fence script:
I know that if "split brain" was occurred, in the "kern.log" file i will see the failure, but i don't know what "exact text" I must find into the "kern.log" file for determine whether you can apply the fence?.

What i need to know for do a complete fence script?:
So, i need to know a "comun text" for both nodes into "kern.log" file, that tell me if a "split brain" was occurred in case of:
1- Cut of power on a drbd node
2- Cut on the drbd net
3- Service of drbd stopped (for error) by a human interaction

if i know that "exact text" will see into of the "kern.log" file, i will can complete my script.

Can anyone help me thinking that must works for drbd 8.3.x and 8.4.x?

Other topic:
After many hours of work, I've done a bash script that works for drbd 8.3.x and 8.4.x (It is universal), and this script is working in production environments.
This script perform online a verification of DRBD Volume(s) replicated since that the "global_common.conf" file have the directive net verify-alg <algorithm> activated.
Also perform:
- Verification of the Service DRBD
- Verification of the state of DRBD for start to run (ie drbd can be out of sync before to start the verification)
- Send a report to a log file with date/hour for each step processed
- Choose which drbd resources must verify (with all his volumes for 8.4.2 or later)
- Show the date/hour of start and end of each resource and volume verified
- Choose the name and location of the log file
- Choose if want send the report by email - to a email address in case positive, and other email address in case negative
- And much more

In conclusion: only is necessary complete a file of variables y add the script to cron.d tasks or run it manually

If you want a copy of my script, please send 5 us$ to .....
.... (just a joke), please only give me your e-mail and i will send you a copy of my script.

Best regards
Cesar

cesarpk · Sep 26, 2013

RobFantini said:
Well our two Dell R720 systems are in production. They have 8 SAS disks , perc 710 cards . the raid 10 storage is split in 3 partitions, one for pve and 2 for drbd .

We have a 3-rd node .

For now if a node dies then manually the kvm's wil be moved to the surviving system.

In the future we'll explore using drbd configuration to :
1- send an email that there is a node down.
2- in the email state the commands to be done manually to migrate the kvm's

After testing a lot then run the commands from a script .

We'lll set up another cluster next month to test the above.

On these paremeters, you can put the email address that you want

Into global_common.conf file and into the directive common - handlers:
Paremeter: "split-brain"
http://www.drbd.org/users-guide-8.4/s-configure-split-brain-behavior.html#s-split-brain-notification

And the same for the Paremeter: "out-of-sync"

Best regards
Cesar

screenie · Sep 26, 2013

i have drbd in active/active mode running since over 3 years without problem, and yes it's easy to control on which node the vm's are running - simply migrate the vm you want to move from one node to the other;
if you create a single drbd device with active/active mode you have control over each vm on which node it will run;
with one active/passive mode drbd device the vm's can only run on one node, the second one only collect's idle time;
with two drbd devices in active/passive mode where each of the nodes has one active and the other passive you can run vm's on each active drbd device;

i tried the active/active and the dual active/passive and went with the active/active mode as the setup is simpler and more flexible to move vm's around;

Search

Search

cluster type: DRBD Active/Active control

RobFantini

Famous Member

term

Well-Known Member

RobFantini

Famous Member

term

Well-Known Member

RobFantini

Famous Member

e100

Renowned Member

e100

Renowned Member

RobFantini

Famous Member

e100

Renowned Member

RobFantini

Famous Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

screenie

Active Member