DRBD Split Brain on scheduled backup job

ecramer · Sep 30, 2011

Our pair of servers went through this yesterday, glad to find the thread. I'll add my $0.02 and log data, maybe it can help.

Two identical hosts, PVE1 and PVE2

Code:

pve1:~/status_scripts# pveversion -v
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

pve1:~/status_scripts# modinfo drbd
filename:       /lib/modules/2.6.32-4-pve/kernel/drivers/block/drbd/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        8.3.7
description:    drbd - Distributed Replicated Block Device v8.3.7
author:         Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
srcversion:     EE47D8BF18AC166BE219757
depends:        cn,lru_cache
vermagic:       2.6.32-4-pve SMP mod_unload modversions 
parm:           minor_count:Maximum number of drbd devices (1-255) (uint)
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           cn_idx:uint
parm:           proc_details:int
parm:           usermode_helper:string

We've got three DRBD resources:
DRBD-R0: scratch/testing space (VM101 on PVE1, VM102 on PVE2)
DRBD-R1: guests exclusive to PVE1 (VM104 on PVE1) <-- failed resource
DRBD-R2: guests exclusive to PVE2 (VM103 on PVE2)

Both hosts were configured to back up to the same NFS share, 18:00 and 19:00
(backups were configured with the web GUI)

PVE1

Code:

vzdump --quiet --node 1 --snapshot --compress --storage nfs-backup --mailto ecramer@iastate.edu --all

101: Sep 29 18:00:03 INFO: Starting Backup of VM 101 (qemu)
101: Sep 29 18:00:03 INFO: running
101: Sep 29 18:00:03 INFO: status = running
101: Sep 29 18:00:04 INFO: backup mode: snapshot
101: Sep 29 18:00:04 INFO: ionice priority: 7
101: Sep 29 18:00:04 INFO:   Logical volume "vzsnap-pve1-0" created
101: Sep 29 18:00:04 INFO: creating archive '/mnt/pve/nfs-backup/vzdump-qemu-101-2011_09_29-18_00_03.tgz'
101: Sep 29 18:00:04 INFO: adding '/mnt/pve/nfs-backup/vzdump-qemu-101-2011_09_29-18_00_03.tmp/qemu-server.conf' to archive ('qemu-server.conf')
101: Sep 29 18:00:04 INFO: adding '/dev/drbdvg0/vzsnap-pve1-0' to archive ('vm-disk-virtio0.raw')
101: Sep 29 18:49:23 INFO: Total bytes written: 31539386880 (10.16 MiB/s)
101: Sep 29 18:49:24 INFO: archive file size: 26.21GB
101: Sep 29 18:49:24 INFO: delete old backup '/mnt/pve/nfs-backup/vzdump-qemu-101-2011_09_22-18_00_02.tgz'
101: Sep 29 18:49:25 INFO:   Logical volume "vzsnap-pve1-0" successfully removed
101: Sep 29 18:49:26 INFO: Finished Backup of VM 101 (00:49:23)

104: Sep 29 18:49:26 INFO: Starting Backup of VM 104 (qemu)
104: Sep 29 18:49:26 INFO: running
104: Sep 29 18:49:26 INFO: status = running
104: Sep 29 18:49:27 INFO: backup mode: snapshot
104: Sep 29 18:49:27 INFO: ionice priority: 7
104: Sep 29 18:49:27 INFO:   Logical volume "vzsnap-pve1-0" created
104: Sep 29 18:49:27 INFO: creating archive '/mnt/pve/nfs-backup/vzdump-qemu-104-2011_09_29-18_49_26.tgz'
104: Sep 29 18:49:27 INFO: adding '/mnt/pve/nfs-backup/vzdump-qemu-104-2011_09_29-18_49_26.tmp/qemu-server.conf' to archive ('qemu-server.conf')
104: Sep 29 18:49:27 INFO: adding '/dev/drbdvg1/vzsnap-pve1-0' to archive ('vm-disk-virtio0.raw')
104: Sep 29 19:05:14 INFO: Total bytes written: 5016472064 (5.05 MiB/s)
104: Sep 29 19:05:18 INFO: archive file size: 1.99GB
104: Sep 29 19:05:18 INFO: delete old backup '/mnt/pve/nfs-backup/vzdump-qemu-104-2011_09_22-19_04_45.tgz'
104: Sep 29 19:05:20 INFO:   Logical volume "vzsnap-pve1-0" successfully removed
104: Sep 29 19:05:20 INFO: Finished Backup of VM 104 (00:15:54)

PVE2

Code:

vzdump --quiet --node 2 --snapshot --compress --storage nfs-backup --mailto ecramer@iastate.edu --all

102: Sep 29 19:00:02 INFO: Starting Backup of VM 102 (qemu)
102: Sep 29 19:00:02 INFO: running
102: Sep 29 19:00:02 INFO: status = running
102: Sep 29 19:00:03 INFO: backup mode: snapshot
102: Sep 29 19:00:03 INFO: ionice priority: 7
102: Sep 29 19:00:03 INFO:   Logical volume "vzsnap-pve2-0" created
102: Sep 29 19:00:03 INFO: creating archive '/mnt/pve/nfs-backup/vzdump-qemu-102-2011_09_29-19_00_02.tgz'
102: Sep 29 19:00:03 INFO: adding '/mnt/pve/nfs-backup/vzdump-qemu-102-2011_09_29-19_00_02.tmp/qemu-server.conf' to archive ('qemu-server.conf')
102: Sep 29 19:00:03 INFO: adding '/dev/drbdvg0/vzsnap-pve2-0' to archive ('vm-disk-virtio0.raw')
102: Sep 29 19:13:54 INFO: Total bytes written: 8082308608 (9.27 MiB/s)
102: Sep 29 19:13:58 INFO: archive file size: 2.84GB
102: Sep 29 19:13:58 INFO: delete old backup '/mnt/pve/nfs-backup/vzdump-qemu-102-2011_09_22-19_00_01.tgz'
102: Sep 29 19:13:59 INFO:   Logical volume "vzsnap-pve2-0" successfully removed
102: Sep 29 19:13:59 INFO: Finished Backup of VM 102 (00:13:57)

103: Sep 29 19:13:59 INFO: Starting Backup of VM 103 (qemu)
103: Sep 29 19:14:00 INFO: running
103: Sep 29 19:14:00 INFO: status = running
103: Sep 29 19:14:00 INFO: backup mode: snapshot
103: Sep 29 19:14:00 INFO: ionice priority: 7
103: Sep 29 19:14:01 INFO:   Logical volume "vzsnap-pve2-0" created
103: Sep 29 19:14:01 INFO: creating archive '/mnt/pve/nfs-backup/vzdump-qemu-103-2011_09_29-19_13_59.tgz'
103: Sep 29 19:14:01 INFO: adding '/mnt/pve/nfs-backup/vzdump-qemu-103-2011_09_29-19_13_59.tmp/qemu-server.conf' to archive ('qemu-server.conf')
103: Sep 29 19:14:01 INFO: adding '/dev/drbdvg2/vzsnap-pve2-0' to archive ('vm-disk-virtio0.raw')
103: Sep 29 19:39:41 INFO: Total bytes written: 12908377088 (7.99 MiB/s)
103: Sep 29 19:39:51 INFO: archive file size: 9.40GB
103: Sep 29 19:39:51 INFO: delete old backup '/mnt/pve/nfs-backup/vzdump-qemu-103-2011_09_22-19_13_20.tgz'
103: Sep 29 19:39:53 INFO:   Logical volume "vzsnap-pve2-0" successfully removed
103: Sep 29 19:39:53 INFO: Finished Backup of VM 103 (00:25:54)

The split brain occurs when PVE1 starts its second job @ 18:49, PVE2 hasn't done anything yet.

Portion of PVE1's kernel log

Code:

pve1:/var/log# cat /var/log/kern.log |grep drbd
Sep 29 18:49:27 pve1 kernel: block drbd1: ASSERT( bio->bi_idx == 0 ) in drivers/block/drbd/drbd_req.c:1029
Sep 29 18:49:28 pve1 kernel: block drbd1: sock was shut down by peer
Sep 29 18:49:28 pve1 kernel: block drbd1: meta connection shut down by peer.
Sep 29 18:49:28 pve1 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Sep 29 18:49:28 pve1 kernel: block drbd1: asender terminated
Sep 29 18:49:28 pve1 kernel: block drbd1: Terminating drbd1_asender

Portion of PVE2's kernel log

Code:

pve2:~# cat /var/log/kern.log |grep drbd
Sep 29 18:49:27 pve2 kernel: block drbd1: magic?? on data m: 0x0 c: 0 l: 0
Sep 29 18:49:28 pve2 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
Sep 29 18:49:28 pve2 kernel: block drbd1: process_done_ee() = NOT_OK
Sep 29 18:49:28 pve2 kernel: block drbd1: Creating new current UUID
Sep 29 18:49:28 pve2 kernel: block drbd1: asender terminated
Sep 29 18:49:28 pve2 kernel: block drbd1: Terminating drbd1_asender
Sep 29 18:49:28 pve2 kernel: block drbd1: Connection closed

I can post the rest or provide other logs if it helps.

I found the broken snapshots on PVE2 when I was repairing the damage:

Code:

pve2:~# drbdadm down r1
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 1 down' terminated with exit code 11

pve2:~# lvscan
  ACTIVE            '/dev/drbdvg2/vm-103-disk-1' [75.00 GB] inherit
  ACTIVE            '/dev/pve/swap' [8.12 GB] inherit
  ACTIVE            '/dev/pve/root' [16.25 GB] inherit
  ACTIVE            '/dev/pve/data' [36.38 GB] inherit
  inactive Original '/dev/drbdvg1/vm-104-disk-1' [64.00 GB] inherit
  inactive Snapshot '/dev/drbdvg1/vzsnap-pve1-0' [1.00 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-101-disk-2' [32.01 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-102-disk-1' [32.00 GB] inherit

PVE1 didn't have any problems

Code:

pve1:/var/log# lvscan
  ACTIVE            '/dev/drbdvg2/vm-103-disk-1' [75.00 GB] inherit
  ACTIVE            '/dev/pve/swap' [8.12 GB] inherit
  ACTIVE            '/dev/pve/root' [16.25 GB] inherit
  ACTIVE            '/dev/pve/data' [36.38 GB] inherit
  ACTIVE            '/dev/drbdvg1/vm-104-disk-1' [64.00 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-101-disk-2' [32.01 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-102-disk-1' [32.00 GB] inherit

When DRBD refused to shut down, I tried to deactivate the volumes that were holding things open. (This is what I did when I was breaking things intentionally to learn about DRBD.) When the lvscan list looked exactly the same, I realized that the volumes were already inactive.

Code:

pve2:~# lvchange -a n /dev/drbdvg1/*             
pve2:~# lvscan
  ACTIVE            '/dev/drbdvg2/vm-103-disk-1' [75.00 GB] inherit
  ACTIVE            '/dev/pve/swap' [8.12 GB] inherit
  ACTIVE            '/dev/pve/root' [16.25 GB] inherit
  ACTIVE            '/dev/pve/data' [36.38 GB] inherit
  inactive Original '/dev/drbdvg1/vm-104-disk-1' [64.00 GB] inherit
  inactive Snapshot '/dev/drbdvg1/vzsnap-pve1-0' [1.00 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-101-disk-2' [32.01 GB] inherit
  ACTIVE            '/dev/drbdvg0/vm-102-disk-1' [32.00 GB] inherit
 
pve2:~# drbdadm down r1
pve2:~# drbdadm attach r1
pve2:~# drbdadm -- --discard-my-data connect r1
pve2:~# drbdadm primary r1

Somehow deactivating the inactive volumes let me shut DRBD down and get on with the business of restoration.

e100 · Sep 30, 2011

See this thread too:
http://forum.proxmox.com/threads/67...he-device-in-a-clustered-setup?highlight=clvm

I am fairly sure that the split brain and the problem in the other thread are caused by lack of clvm (Cluster LVM)

To avoid the split brain without clvm follow these rules:
1. All VMS on a given DRBD Volume run only on one node, the node that is performing the backup
2. Never backup both nodes at the same time or even close to the same time.
3. disable the LVM cache(not sure if it helps or not, but I disabled mine) see: http://www.drbd.org/users-guide/s-lvm-drbd-as-pv.html

Our backups have been very reliable and no split-brains from a backup since following those rules.
We also regularly restore our backups to another server to ensure they work, so far 100% success rate on restoring ~50 different KVM VMs each time we have tested.

ecramer · Sep 30, 2011

Thanks for the reply. I had already split the backups apart, but I've disabled the lvm cache now as well. The other thread mentions running vgscan to refresh the cache; when I was unable to stop the DRBD resource I ran lvscan, vgscan and pvscan looking for trouble. I inadvertently refreshed the cache file and was able to stop the resource.

(I had to take those lines out of my post, I was over the 10,000 character limit. *ducks under the table*)

Search

Search

DRBD Split Brain on scheduled backup job

ecramer

New Member

e100

Renowned Member

ecramer

New Member