HowTo recover from DRBD split-brain with primary/primary

DRVTiny

New Member
Feb 5, 2010
27
0
1
I've followed all instructions in DRBD documentation, but Proxmox or something else (LVM?) dead-locks /dev/drbd0 and i cant "drbdadm secondary r0". So it's impossible to execute "drbdadm --discard-mydata" and recover from split-brain.

ve1:~# drbdadm secondary r0
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 0 secondary' terminated with exit code 11

What to do in this situation, which was trigered by simply doing /etc/init.d/networking restart???

P.S. Reboot doesnt help me at all.
 
the very best place to get support for DRBD is the DRBD community or even better, the DRBD support team at www.linbit.com.
 
the very best place to get support for DRBD is the DRBD community or even better, the DRBD support team at www.linbit.com.

Proxmox team tends to use DRBD in primary/primary mode for shared storage solution (in OVH project for example, right?), but you dont know how to recover from split-brain after temporary network down situation. It's slightly confusing me...

P.S. I've read through all of DRBD mailing lists and understand that DRBD team doesnt support primary/primary and recommends to everybody to use primary/secondary instead :)
 
No idea from where you get all this strange nonsense (there is no OVH DRBD project with us, DRBD is not shared storage - it´s shared-nothing, replicated storage solution). your are right, it seems that you are slightly confused. and yes, DRBD supports primary/primary and also yes, we know how to configure Proxmox VE with DRBD.

Back to the topic: read the error message you have the first hint. If you follow my advice - asking the right people in the right forums - you will get answers to your questions.
 
I've followed all instructions in DRBD documentation, but Proxmox or something else (LVM?) dead-locks /dev/drbd0 and i cant "drbdadm secondary r0". So it's impossible to execute "drbdadm --discard-mydata" and recover from split-brain.

ve1:~# drbdadm secondary r0
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 0 secondary' terminated with exit code 11

What to do in this situation, which was trigered by simply doing /etc/init.d/networking restart???

P.S. Reboot doesnt help me at all.

On the node, which will become secondary you have to deactivate the volume group with 'vgchange -an <volume group name>'. There should be no VM's running on the node on that group.
 
On the node, which will become secondary you have to deactivate the volume group with 'vgchange -an <volume group name>'. There should be no VM's running on the node on that group.

Thanks :)
OK, i did it.
Now on the node1 (primary) i have this state:
ve1:~# cat /proc/drbd
version: 8.3.4 (api:88/proto:86-91)
GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@oahu, 2010-01-15 11:39:31
0: cs:WFBitMapS ro:primary/Secondary ds:UpToDate/UpToDate C r----
ns:0 nr:0 dw:123980 dr:345760 al:84 bm:41 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:618640

And on the node2 (secondary):
ve2:~# cat /proc/drbd
version: 8.3.4 (api:88/proto:86-91)
GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@oahu, 2010-01-15 11:39:31
0: cs:WFBitMapT ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:0 dw:0 dr:4348 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:520192

And this state (p:WFBitMapS/s:WFBitMapT) not changing :( So my /dev/drbd0 working inproperly, right?
 
Thanks :)
OK, i did it.
Now on the node1 (primary) i have this state:
ve1:~# cat /proc/drbd
version: 8.3.4 (api:88/proto:86-91)
GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@oahu, 2010-01-15 11:39:31
0: cs:WFBitMapS ro:primary/Secondary ds:UpToDate/UpToDate C r----
ns:0 nr:0 dw:123980 dr:345760 al:84 bm:41 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:618640

And on the node2 (secondary):
ve2:~# cat /proc/drbd
version: 8.3.4 (api:88/proto:86-91)
GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@oahu, 2010-01-15 11:39:31
0: cs:WFBitMapT ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:0 dw:0 dr:4348 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:520192

And this state (p:WFBitMapS/s:WFBitMapT) not changing :( So my /dev/drbd0 working inproperly, right?

Check your network - do you use crossover cable for DRBD communication or traffic goes through switch? Try ping with different packet sizes.
Do you have non default value for sndbuf-size (default is 128k)?
A good old reboot of both nodes if you have no other ideas.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!