Recovery from split-brain

  • Thread starter Thread starter fluke
  • Start date Start date
F

fluke

Guest
Hi,

I am currently testing scenarios for recovery from split-brain. Usually this involves making one of the nodes (the split-brain victim) secondary.
When I try to do that, it refuses:

Code:
proxmox1:~# drbdadm secondary r1
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 1 secondary' terminated with exit code 11

This is probably happening, because of your locking mechanism, that you have implemented. The only way to fix this is to reboot the split-brain victim, which for me is not acceptable, because this will mean downtime for the VM's.
Is it possible somehow to recover from split-brain without reboot, for example disabling the locking mechanism (preferably via command line, so this could be scripted) if this is the cause?

Just for info: my setup is with two resources r0 holds the VM's for the master node and r1 holds the VM's for the slave node, so it's easy to choose the split-brain victim.
 
I managed to fix the problem. For some unknown reason the command:

Code:
vgchange -an [I]<volume group name>[/I]

did not remove the required logical volumes from LVM management and I had to do it manually with dmsetup.

Now 'vgchange -an' works and I am able to recover from split-brain without reboot.
 
did not remove the required logical volumes from LVM management and I had to do it manually with dmsetup.

Now 'vgchange -an' works and I am able to recover from split-brain without reboot.

Sorry, but 'vgchange -an' does not remove anything - it only deactivates a volume group.
 
Sorry, but 'vgchange -an' does not remove anything - it only deactivates a volume group.

You have to deactivate the volume group before making a drbd node secondary, because lvm holds the drbd device open.

Maybe I wasn't clear - when a volume group is deactivated, logical volumes are removed from LVM management or it is more proper to say, that they are disabled (not open for read or write).
With the command 'dmsetup info' I can see which volumes use the device-mapper driver. When a volume group is deactivated, its logical volumes should not be listed by 'dmsetup info'.

Please correct me if I am wrong.
 
Last edited by a moderator: