Recovering from DRBD Split Brain Without Backup/Restore

jcrowley · Nov 19, 2012

Our Proxmox VE cluster sitting on top of DRBD experienced a split-brain condition recently, and we needed to recover during a six-hour maintenance window. I needed a method to recover from split brain that was faster than the one described here: http://pve.proxmox.com/wiki/DRBD#Recovery_from_split_brain_when_using_one_DRBD_volume

That procedure requires you to back up the VM to a file on one node and restore it again on the other. While that method is safe and effective, moving hundreds of GB of data takes a long time. Moving it twice takes even longer.

I developed the following procedure to do the recovery in half the time and documented it for our company wiki. If anyone wants this in Mediawiki format, let me know and I'll forward it to you.

[h=1]Proxmox DRBD Split-Brain HOWTO[/h]Our example configuration is a cluster with one DRBD volume named as follows:

Resource Name: r0 (/dev/drbd0)
Host Server Names: NODEA and NODEB

DRBD is running in a Primary/Primary configuration, and both nodes have VMs running on DRBD resource r0. A split-brain condition has occurred and we will be recovering from that. We have chosen to keep the data on NODEA and discard the changes that were made to the DRBD volume on NODEB during the split-brain condition. We will save the NODEB data by copying it to NODEA prior to overwriting data on the DRBD volume on NODEB.

There is much room for error in this procedure. You should proceed with extreme caution and know what you're doing.

[h=2]Preserve VMs By Copying Logical Volumes[/h] For each virtual machine on NODEB that you wish to preserve, do the following.

On NODEB, shut down the virtual machine and mark its logical volumes (LVs) active
1. Shut down the virtual machine. This will mark the virtual machines LVs inactive.
2. Run the following command for each LV to verify that it is inactive.
  
  lvs <volume group>/<logical volume>
  - In the Attr (attributes) column you should see "-wi-----".
3. Execute the following command for each affected LV to mark the volume active.
  lvchange -aly <volume group>/<logical volume>
4. Verify that each LV is active by running this command for each one.
  lvs <volume group>/<logical volume>
  - In the Attr column you should now see "-wi-a---" (a = active).
Mark the same LVs on NODEA active too so that we can begin copying data.
1. Run the following command for each LV to verify that it is inactive.
  
  lvs <volume group>/<logical volume>
  - In the Attr (attributes) column you should see "-wi-----".
2. Execute the following command for each affected LV to mark the volume active.
  lvchange -aly <volume group>/<logical volume>
3. Verify that each LV is active by running this command for each one.
  lvs <volume group>/<logical volume>
  - In the Attr column you should now see "-wi-a---" (a = active).
Copy LVs from NODEB to NODEA.
Here we'll use a tool called netcat to facilitate transferring your data over the network and dd to copy the actual LVs. If you wish to encrypt your data, use ssh instead of netcat. Beware that when using ssh a lot of processor is consumed in order to encrypt large volumes of data. Complete the following steps for each LV you wish to copy.
1. On NODEA, execute the following command to "listen" for data and write it to the LV when it is received.
  nc -l -p 19000 | dd bs=16M of=/dev/<volume group>/<logical volume>
2. On NODEB, execute the following command to initiate the copy of data to NODEA.
  dd bs=16M if=/dev/<volume group>/<logical volume> | nc -q 5 <NODEA> 19000
Depending on the size of the volumes you're copying, the speed of your physical disks, and the speed of your network, it may take minutes or days to complete the copy.
If you wish to see progress of your copy, you can use the following command from a separate terminal to instruct dd to output statistics. This command can be run as often as you prefer on either node.
kill -USR1 `pidof dd`

[h=2]Migrate VMs to NODEA and Boot Them[/h]

Using the Proxmox web interface, shutdown each VM on NODEB you wish to migrate if you have not already done so.
Migrate each affected virtual machine from NODEB to NODEA. This should be an offline migration (do not check the Online box).
Boot each VM on NODEA and verify that it works.

[h=2]Reestablish DRBD Connection and Synchronize[/h] The resource name in the following commands will usually be "r0".

On NODEB execute the following commands for each DRBD resource where consistency is lost.
drbdadm secondary <resource name> drbdadm disconnect <resource name> drbdadm -- --discard-my-data connect <resource name>
On NODEA execute the following commands for each DRBD resource where consistency is lost.
drbdadm connect <resource name>
- Once the DRBD connection is reestablished, it will take some time to sync. You can watch progress with this command on either node.
  watch cat /proc/drbd
Back on NODEB execute the following commands for each DRBD resource where consistency is lost.
drbdadm primary <resource name>

Search

Search

Recovering from DRBD Split Brain Without Backup/Restore

jcrowley

New Member

We value your privacy