DRBD Primary/Primary and integrity checks

ecramer · May 17, 2011

I've been working with Proxmox VE for about four months now, and I'm looking for some suggestions (or maybe reassurance) that what I'm doing is "right." The technical information of the post is mostly DRBD related, but the end result leads back to PVE. (I apologize if this sounds like a broken record, especially to the Proxmox 2.0/HA developers...)

What I want, but can't have*

Proxmox Host 1
--------------------
Local OS disk
DRBD-R0 - Primary
DRBD-R1 - Primary
DRBD-R2 - Secondary

Proxmox Host 2
--------------------
Local OS disk
DRBD-R0 - Primary
DRBD-R1 - Secondary
DRBD-R2 - Primary

R0 would be a testing area, R1 and R2 would be for production machines. Running a Primary/Secondary system makes it easy to avoid split brain problems. However, PVE isn't happy if all of its storage options aren't writable.* To achieve this, DRBD must be running in dual primary mode. Things seemed alright, but trouble was brewing in the dark. DRBD doesn't complain much (or at all in some cases) letting this little charm slip by for almost three weeks:

(from kern.log.3)
Apr 15 10:56:07 pve2 kernel: block drbd0: Digest integrity check FAILED.
Apr 15 10:56:07 pve2 kernel: block drbd0: error receiving Data, l: 4136!

Instant split-brain, probably for no "good" reason. It happened right after a new VM was created and started. Lars Ellenberg has explained the problem in a few different places, here's something close: "Digest-integrity with dual-primary is not a very good idea." (www.mail-archive.com) It boils down to the local data changing before the two DRBD daemons have their hashes sorted out.

I removed the digest-integrity options from the net{} section and started looking at the verify-alg options in syncer{}. Scheduling "drbdadm verify all" to run during periods of low activity reduces the risk of spurious split-brain errors, but it's still a possibility.

I haven't had much luck searching for similar stories with Proxmox and DRBD, so I started reading about Xen. I found a page that seems to have all of the components: http://backdrift.org/live-migration-and-synchronous-replicated-storage-with-xen-drbd-and-lvm One of the visitor comments brings up the Primary/Secondary problem; some magic Xen script seems to handle internally. The DRBD user manual mentions the disable-sendpage parameter, but I can't figure out what it really does.

How does everyone else handle this? The Wiki/How-To about DRBD doesn't mention data integrity checks. Maybe I'm worried for nothing and I should trust the piles of RAID-1 and daily backups to do their job. Thoughts?

PVE1 / PVE2
-----------------------
Dell PowerEdge R515
2x Opteron 4122
16GB DDR3
Dell H700 RAID (LSI MegaSAS 9260)
65GB R1 - PVE Install
400GB R1 - DRBD-R0
465GB R1 - DRBD-R1
465GB R1 - DRBD-R2
2x hot spares
(PVE and DRBD-R0 live on the same two disks)
2x Dual Broadcom 5709 NIC w/TOE
- WAN + 3x DRBD

Proxmox 1.8 installed and worked well right away.

*If one storage option fails, it seems to take all subsequent storage with it.
Local Storage
DRBD-R0
DRBD-R1
DRBD-R2 <-- if this is "secondary"
NFS-ISO <-- these storage groups
NFS-Backup <--aren't initialized

I noticed trouble before I had restarted the machines - having any DRBD resource in Secondary effectively disables the web GUI and any backups with a stream of "vgchange" errors on both PVE hosts. The errors themselves make sense - PVE can't open a volume that is read-only. The part I don't understand is why the host needs to access volume groups for machines that it isn't running or trying to modify.

e100 · May 18, 2011

I use the "drbdadm verify all" method
There does not seem to be any way to throttle the speed of the verify so it will drastically impact performance.

The DRBD manual states:
"We suggest to use the data-integrity-alg only during a pre-production phase due to its CPU costs. Further we suggest to do online verify runs regularly e.g. once a month during a low load period."
http://www.drbd.org/users-guide/re-drbdconf.html

The Proxmox DRBD How-To suggests running VMs on one machine on one DRBD volume and VMs on the other machine on a different DRBD volume.
If you do that, when a split-brain happens, and it will, recovery is very simple, just discard the data on the node and drbd volume that is not running the VMs.

I have all of my Proxmox + DRBD servers setup with only two DRBD volumes.
ServerA stores all of it's virtual machines on drbd0
ServerB stores all of it's virtual machines on drbd1
both drbd0 and drbd1 are primary/primary.

If drbd0 was split brain I would simply discard data on ServerB drbd0 and let it resync.
The only time I ever have to worry about a split brain causing an issue is if I am live migrating servers at the time the split-brain occurs.

ecramer · May 18, 2011

I suppose that means you don't have much trouble with write-after-write/hash problems during the monthly verify. (I'm not sure how much traffic is necessary to cause a problem.)

We're aiming for a similar setup here - two "single server" shares with a third as scratch/development space. I'm looking for horror stories (or the lack therof) describing hundreds of broken SQL entries because a few bits got flipped after passing through KVM, the kernel, blind checksums on TOE network cards, RAID controller caches... It's amazing that computers work at all.

I ran PVE 1.7 and DRBD on two dissimilar machines for two months and never had problems:
Dell Optiplex 755
Core2 Quad
DRBD
- Linux mdraid (2x 500, raid-0)
- 3Com 3C996B LAN

Misc Aberdeen server
Older dual-core Xeon
DRBD
- areca 1261/1280ML (4x 500, raid-10)
- Intel 82563EB LAN

After the experimenting was over, I re-read any relevant documentation to make sure I was "doing it right." I was aiming for maximum reliability, instead I found a new source of IT paranoia.

-----------------------

You're right about the high resource usage during verification:

Code:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21669 root      20   0     0    0    0 S   70  0.0   2:48.83 drbd1_worker
21768 root      20   0     0    0    0 S   36  0.0   1:02.08 drbd1_asender
21719 root      20   0     0    0    0 S   14  0.0   0:28.32 drbd1_receiver

That's a single volume, just over two hours for 465GiB (500GB)

e100 · May 18, 2011

I have 12 machines setup as 6 two server proxmox clusters with DRBD.
I have been running Windows Servers with SQL in production for over six months on this setup.

The only time that a verify ever found a problem was when one of my RAID cards freaked out and locked up.

I backup the MSSQL database backups offsite using rdiff-backup.
I backup my virtual servers three times a week to encrypted disks that we take offsite.

If DRBD ever messes up and my virtual servers become corrupted, I have enough backup data to recover.
After using DRBD for many years I feel it is very reliable.
But never use DRBD in place of a good backup system.

Search

Search

DRBD Primary/Primary and integrity checks

ecramer

New Member

e100

Renowned Member

ecramer

New Member

e100

Renowned Member

We value your privacy