DRBD Split Brain on scheduled backup job

Mee4deil · May 31, 2011

Hi,

we are currently experiencing problems with our cluster setup. When proxmox performs a scheduled backup a split-brain on the drbd backend device is created.

The Log on the node that performs the backup: http://pastebin.com/raw.php?i=KCBL3c2x

And the logfile from the peer that (according to the first log) shut down the connection:

http://pastebin.com/raw.php?i=stn985S4

Code:

# pveversion -v
pve-manager: 1.8-17 (pve-manager/1.8/5948)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.26-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

Any advice on what this could be about would be highly appreciated!

atyur01 · May 31, 2011

The same issue here. The previous version worked without problems.

tom · May 31, 2011

can you provide more details, about which version do you talk here?

Mee4deil · May 31, 2011

The proxmox related version numbers you find in my original post. DRBD Version is

Code:

# dpkg -l "*drbd8*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name                                 Version                              Description
+++-====================================-====================================-========================================================================================
ii  drbd8-utils                          2:8.3.7-1~bpo50+1                    RAID 1 over tcp/ip for Linux utilities

atyur01 · May 31, 2011

Code:

$ pveversion -v
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

Code:

$ modinfo drbd
filename:       /lib/modules/2.6.32-4-pve/kernel/drivers/block/drbd/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        8.3.7
description:    drbd - Distributed Replicated Block Device v8.3.7
author:         Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
srcversion:     EE47D8BF18AC166BE219757
depends:        cn,lru_cache
vermagic:       2.6.32-4-pve SMP mod_unload modversions 
parm:           minor_count:Maximum number of drbd devices (1-255) (uint)
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           cn_idx:uint
parm:           proc_details:int
parm:           usermode_helper:string

tom · May 31, 2011

pls provide all details, which version works and which one failed. and about which packages do you talk?

if you are able to give all infos I will try to reproduce the issue.

atyur01 · May 31, 2011

My situation before May 1:

2 nodes connected via bonding (ALB-mode) + switch.
1 DRBD device (dual primary) via bonding.
1 NFS server + NFS share (for VM backups) connected to the same switch like the both nodes.
Proxmox version 1.6
Backup: every day
Split-brain: never

The situation after May 1:

Proxmox update to version 1.8, the same settings for Proxmox and DRBD
Backups: every day
Split-brain: on the same day or after 2-3 days during backup

Mee4deil · May 31, 2011

Right now i don't know how to reproduce the split-brain. As you can see in the log it occurs right after vzdump is launched and somehow affects the *other* node in a way such that it drops the connection. It does however not occur at every vzdump call but rather "every once and a while".

tom · May 31, 2011

how do you use vzdump? provide full details about your backup config and include the backup log.

atyur01 · May 31, 2011

I use it just over the WebGUI, nothing special. Mode: snapshot

Code:

May 31 01:00:02 INFO: Starting Backup of VM 101 (qemu)
May 31 01:00:02 INFO: running
May 31 01:00:02 INFO: status = running
May 31 01:00:04 INFO: backup mode: snapshot
May 31 01:00:04 INFO: ionice priority: 7
May 31 01:00:04 INFO:   Logical volume "vzsnap-vmhost01.net-0" created
May 31 01:00:04 INFO: creating archive '/mnt/pve/bck_storage_nfs/vzdump-qemu-101-2011_05_31-01_00_02.tgz'
May 31 01:00:04 INFO: adding '/mnt/pve/bck_storage_nfs/vzdump-qemu-101-2011_05_31-01_00_02.tmp/qemu-server.conf' to archive ('qemu-server.conf')
May 31 01:00:04 INFO: adding '/dev/VolGroupVM/vzsnap-vmhost01.net-0' to archive ('vm-disk-scsi0.raw')
May 31 01:04:23 INFO: Total bytes written: 4447513600 (16.37 MiB/s)
May 31 01:04:24 INFO: archive file size: 1.22GB
May 31 01:04:24 INFO: delete old backup '/mnt/pve/bck_storage_nfs/vzdump-qemu-101-2011_05_28-01_00_01.tgz'
May 31 01:04:28 INFO:   Logical volume "vzsnap-vmhost01.net-0" successfully removed
May 31 01:04:28 INFO: Finished Backup of VM 101 (00:04:26)

Mee4deil · May 31, 2011

We have two backup jobs, each for all VMs on each node. Here is the syslog for node 1:

Code:

May 29 15:04:01 virtsrv01 /USR/SBIN/CRON[21209]: (root) CMD (vzdump --quiet --node 2 --snapshot --compress --storage backup --mailto root --all)
May 29 15:04:01 virtsrv01 /USR/SBIN/CRON[21210]: (root) CMD (vzdump --quiet --node 1 --snapshot --compress --storage backup --mailto root --all)
May 29 15:04:02 virtsrv01 vzdump[21210]: INFO: starting new backup job: vzdump --quiet --node 1 --snapshot --compress --storage backup --mailto root --all
May 29 15:04:02 virtsrv01 vzdump[21210]: INFO: Starting Backup of VM 101 (qemu)
May 29 15:04:49 virtsrv01 pvemirror[2860]: starting cluster syncronization
May 29 15:04:52 virtsrv01 pvemirror[2860]: syncing templates
May 29 15:04:52 virtsrv01 pvemirror[2860]: cluster syncronization finished (2.58 seconds (files 0.00, config 0.00))
May 29 15:05:01 virtsrv01 /USR/SBIN/CRON[21376]: (root) CMD (/usr/local/sbin/md-pvesync)
May 29 15:05:49 virtsrv01 pvemirror[2860]: starting cluster syncronization
May 29 15:05:49 virtsrv01 pvemirror[2860]: syncing templates
May 29 15:05:49 virtsrv01 pvemirror[2860]: cluster syncronization finished (0.09 seconds (files 0.00, config 0.00))
May 29 15:06:49 virtsrv01 pvemirror[2860]: starting cluster syncronization
May 29 15:06:49 virtsrv01 pvemirror[2860]: syncing templates
May 29 15:06:49 virtsrv01 pvemirror[2860]: cluster syncronization finished (0.08 seconds (files 0.00, config 0.00))
May 29 15:07:49 virtsrv01 pvemirror[2860]: starting cluster syncronization
May 29 15:07:49 virtsrv01 pvemirror[2860]: syncing templates
May 29 15:07:49 virtsrv01 pvemirror[2860]: cluster syncronization finished (0.08 seconds (files 0.00, config 0.00))
May 29 15:08:18 virtsrv01 vzdump[21210]: INFO: Finished Backup of VM 101 (00:04:16)
May 29 15:08:18 virtsrv01 vzdump[21210]: INFO: Starting Backup of VM 103 (qemu)
May 29 15:08:19 virtsrv01 kernel: block drbd1: ASSERT( bio->bi_idx == 0 ) in drivers/block/drbd/drbd_req.c:1029
May 29 15:08:19 virtsrv01 kernel: block drbd1: sock was shut down by peer
[...] [the continuation hast been posted to pastebin in my first post]

Unfortunately i dont have the vzdump log anymore since (!!!) it seems to override the log in /var/log/vzdump everytime it starts. But the backup job finished fine, there was no error reported by vzdump!

So in this case, one backup (101) finished without causing the split brain, whereas (103) caused the split brain. As the backup process itself is not affected the backup continued backing up other VM IDs later on without problems.

atyur01 · May 31, 2011

Exactly, I get this error during backup too:

Code:

kernel: block drbd0: ASSERT( bio->bi_idx == 0 ) in drivers/block/drbd/drbd_req.c:1029

Mee4deil · May 31, 2011

The drbd guys told me that this is actually a result of the fact that the peer shut down connection. So what is logged on the *other* node is probably more interesting. Do you also get that "magic??" line on the other node?

e100 · Jun 1, 2011

I use drbd on many machines, all of my proxmox nodes are setup with two drbd resources as suggested in the wiki.

I recall having issues if I ran vzdump on both nodes at the same time.
If my memory serves right it was some sort of lvm related issue, but it was long ago.
I never bothered to track down why it was an issue I simply setup schedules so that they never overlapped.

Not sure if that will help any of you or not, but I hope you find a solution.

atyur01 · Jun 1, 2011

Yes, I get "magic??" on the node, which closes the connection:

Code:

block drbd0: magic?? on data m: 0x78563412 c: 0 l: 0
block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )

Maybe it's also interesting that I have 2 DRBD-Devices (both dual-primary), but only one has this problem -> the device with VMs

@e100: my backup schedules are set with 2 hours difference

Mee4deil · Jun 2, 2011

tom said:
pls provide all details, which version works and which one failed. and about which packages do you talk?

if you are able to give all infos I will try to reproduce the issue.

Did you have any success reproducing the split brain? This matter is quite disturbing our backup

Mee4deil · Jun 21, 2011

We now configured the backup schedule such that no backup is performed on both nodes at the same time and the split never occurred again ever since. So to reproduce this one probably needs to run scheduled backups on both nodes at the same time. Was there any progress made on this issue?

hk@ · Jun 22, 2011

Hi
as we've been researching a HA storage setup for proxmox too I can only speculate, but ask for confirmation of the gurus if possible:
lvm on proxmox doesn't know about the drbd-block-device being shared between two hosts, therefore the problem arises that two lvm configs can't work concurrent on the very same blockdevice (without risking to get into splitbrain) - ref. http://mirantis.blogspot.com/2011/0...howComment=1306528806951#c5028478804503884300 (Florian Haas of linbit comments about this).

Therefore I understand doing a lvm snapshot on this shared blockdevice is asking for trouble - especially if you do it simultaneousely...

IMHO adding some timeframe between those backups is no real deal - it might take longer to backup as you think and you will be in hell.

Anyone tried to build this like in the wiki (http://pve.proxmox.com/wiki/DRBD) and add clvm to this?

(we're still searching for the brain-dead redundant setup here)

regards
hk

Mee4deil · Jun 25, 2011

Right - LVM snapshots performed on both sides of a primary-primary setup would probably be trouble - but i'd expect proxmox to handle this in [whatever] way or not to offer simultaneous backups for cluster storages.

Anyway - another fact is that we have two drbd block devices - one for each node containing the VMs running mostly on that node as recommended by the proxmox setup guide. Hence, even for simultaneous backups, it never happens that both nodes create an LVM snapshot on the same volume group.

e100 · Jun 25, 2011

Mee4deil, I too run all of my proxmox servers with Two DRBD volumes as suggested in the wiki.

It is random, but if server A is backing up from DRBD1 and server B is backing up from DRBD2 at the same time it can result in split brain.

In the past I have also seen servers hang when trying to remove the snapshots.
Again, this only ever happened when backups were running on both servers at the same time.

Maybe we need Cluster aware LVM like hk@ suggested:
http://packages.debian.org/lenny/clvm

This may be of some help: http://mirantis.blogspot.com/2011/06/clustered-lvm-on-drbd-resource-in.html

DRBD Split Brain on scheduled backup job

New Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

New Member

New Member

Renowned Member

New Member

New Member

New Member

Renowned Member

New Member

Renowned Member

We value your privacy