DRBD live migration when not connected

Oct 22, 2009
92
1
26
Hi,

I have tested 2.6.18 and DRBD for a few weeks. It has been stable and live migration works like a charm.

However today I checked drbd status by coincidence and one of my DRBD volumes apparently had a splitbrain a few days ago. I have followed the guide on pve.proxmox.com, but I see I definitely need to set up notification when this can occur out of nothing. I cannot find anything suspicious leading up to this event. The error log is shown below for server #1:
Code:
Sep  1 21:02:38 p1 kernel: block drbd2: Digest integrity check FAILED.
Sep  1 21:02:38 p1 kernel: block drbd2: error receiving Data, l: 4140!
Sep  1 21:02:38 p1 kernel: block drbd2: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
Sep  1 21:02:38 p1 kernel: block drbd2: asender terminated
Sep  1 21:02:38 p1 kernel: block drbd2: Terminating asender thread
Sep  1 21:02:38 p1 kernel: block drbd2: Creating new current UUID
Sep  1 21:02:38 p1 kernel: block drbd2: Connection closed
Sep  1 21:02:38 p1 kernel: block drbd2: conn( ProtocolError -> Unconnected ) 
Sep  1 21:02:38 p1 kernel: block drbd2: receiver terminated
Sep  1 21:02:38 p1 kernel: block drbd2: Restarting receiver thread
Sep  1 21:02:38 p1 kernel: block drbd2: receiver (re)started
Sep  1 21:02:38 p1 kernel: block drbd2: conn( Unconnected -> WFConnection ) 
Sep  1 21:02:38 p1 kernel: block drbd2: Handshake successful: Agreed network protocol version 91
Sep  1 21:02:38 p1 kernel: block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
Sep  1 21:02:38 p1 kernel: block drbd2: conn( WFConnection -> WFReportParams ) 
Sep  1 21:02:38 p1 kernel: block drbd2: Starting asender thread (from drbd2_receiver [8949])
Sep  1 21:02:38 p1 kernel: block drbd2: data-integrity-alg: sha1
Sep  1 21:02:38 p1 kernel: block drbd2: drbd_sync_handshake:
Sep  1 21:02:38 p1 kernel: block drbd2: self FAD15BFCF355A9C5:1D714E0E2AF45CA3:443B58EFC77E89EF:81102D203587BE84 bits:0 flags:0
Sep  1 21:02:38 p1 kernel: block drbd2: peer 74C295FE5A299DD5:1D714E0E2AF45CA3:443B58EFC77E89EF:81102D203587BE84 bits:7 flags:0
Sep  1 21:02:38 p1 kernel: block drbd2: uuid_compare()=100 by rule 90
Sep  1 21:02:38 p1 kernel: block drbd2: Split-Brain detected, dropping connection!
Sep  1 21:02:38 p1 kernel: block drbd2: helper command: /sbin/drbdadm split-brain minor-2
Sep  1 21:02:39 p1 kernel: block drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0)
Sep  1 21:02:39 p1 kernel: block drbd2: conn( WFReportParams -> Disconnecting ) 
Sep  1 21:02:39 p1 kernel: block drbd2: error receiving ReportState, l: 4!
Sep  1 21:02:39 p1 kernel: block drbd2: asender terminated
Sep  1 21:02:39 p1 kernel: block drbd2: Terminating asender thread
Sep  1 21:02:39 p1 kernel: block drbd2: Connection closed
Sep  1 21:02:39 p1 kernel: block drbd2: conn( Disconnecting -> StandAlone ) 
Sep  1 21:02:39 p1 kernel: block drbd2: receiver terminated
Sep  1 21:02:39 p1 kernel: block drbd2: Terminating receiver thread
server #2:
Code:
Sep  1 21:02:38 p2 kernel: block drbd2: sock was shut down by peer
Sep  1 21:02:38 p2 kernel: block drbd2: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) 
Sep  1 21:02:38 p2 kernel: block drbd2: short read expecting header on sock: r=0
Sep  1 21:02:38 p2 kernel: block drbd2: meta connection shut down by peer.
Sep  1 21:02:38 p2 kernel: block drbd2: asender terminated
Sep  1 21:02:38 p2 kernel: block drbd2: Terminating asender thread
Sep  1 21:02:38 p2 kernel: block drbd2: Creating new current UUID
Sep  1 21:02:38 p2 kernel: block drbd2: Connection closed
Sep  1 21:02:38 p2 kernel: block drbd2: conn( BrokenPipe -> Unconnected ) 
Sep  1 21:02:38 p2 kernel: block drbd2: receiver terminated
Sep  1 21:02:38 p2 kernel: block drbd2: Restarting receiver thread
Sep  1 21:02:38 p2 kernel: block drbd2: receiver (re)started
Sep  1 21:02:38 p2 kernel: block drbd2: conn( Unconnected -> WFConnection ) 
Sep  1 21:02:38 p2 kernel: block drbd2: Handshake successful: Agreed network protocol version 91
Sep  1 21:02:38 p2 kernel: block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
Sep  1 21:02:38 p2 kernel: block drbd2: conn( WFConnection -> WFReportParams ) 
Sep  1 21:02:38 p2 kernel: block drbd2: Starting asender thread (from drbd2_receiver [8906])
Sep  1 21:02:38 p2 kernel: block drbd2: data-integrity-alg: sha1
Sep  1 21:02:38 p2 kernel: block drbd2: drbd_sync_handshake:
Sep  1 21:02:38 p2 kernel: block drbd2: self 74C295FE5A299DD5:1D714E0E2AF45CA3:443B58EFC77E89EF:81102D203587BE84 bits:7 flags:0
Sep  1 21:02:38 p2 kernel: block drbd2: peer FAD15BFCF355A9C5:1D714E0E2AF45CA3:443B58EFC77E89EF:81102D203587BE84 bits:0 flags:0
Sep  1 21:02:38 p2 kernel: block drbd2: uuid_compare()=100 by rule 90
Sep  1 21:02:38 p2 kernel: block drbd2: Split-Brain detected, dropping connection!
Sep  1 21:02:38 p2 kernel: block drbd2: helper command: /sbin/drbdadm split-brain minor-2
Sep  1 21:02:39 p2 kernel: block drbd2: meta connection shut down by peer.
I'll post the DRBD stuff to the DRBD mailing list also. As far as I understand the "Digest integrity"-error might occur in rare circumstances or when hardware is defect. Normally it should recover itself, but I guess the split brain situation occurs when running in Primary/Primary and having VMs running on both servers. As I have been testing I'm not 100% sure if I only had VMs running on 1 server on the same DRBD device.

The thing that worries me is that I am still allowed to do live migration even though DRBD is running in Primary/Unknown on both servers. I guess the only thing PVE cares about is the "Shared"-tick when adding the volume.
Will DRBD status be integrated for HA?

I guess it should be possible to make a fix in /usr/sbin/qmigrate to take it into account?

Best regards,
Bo
 
The hack I can come up with right now - not knowing DRBD very well is to do some sort of: "drbdoverview | grep <PV device>", where PV device could be using pvs-command. I don't know if that would be a safe way to do it. If DRBD disconnects while migrating you would still have a problem.
 
I wonder why DRBD does not detect that situation? The nodes are connected, so it should be easy to detect that there is something seriously wrong.
 
Did a small experiment to make PVE stop migration if trying to online migrate a DRBD based VM. If migration is offline moving the VM is allowed. However a warning is printed telling if DRBD is not running Primary/Primary.
The code uses grep and gawk - I should probably sit down and learn a bit of Perl in stead ;-).
Code:
--- qmigrate.bak    2010-09-03 23:54:35.000000000 +0200
+++ qmigrate    2010-09-04 11:56:08.000000000 +0200
@@ -150,8 +150,10 @@
     });
 
     # and add used,owned/non-shared disks (just to be sure we have all)  
 
+    my $drbd_in_use = 0;
+    my $drbd_online = 1;
     my $sharedvm = 1;
     foreach my $ds (keys %$di) {
 
         next if PVE::QemuServer::drive_is_cdrom ($di->{$ds});
@@ -163,8 +165,20 @@
         my ($sid, $volname) = PVE::Storage::parse_volume_id ($volid);
 
         my $scfg =  PVE::Storage::storage_config ($qm->{storecfg}, $sid);
 
+        # Check if VM is using DRBD, and if it's online
+        if ($scfg->{type} eq 'lvm') {
+            my $lvm_pv = `pvs | grep $scfg->{vgname} | awk '{print \$1}'`;
+            if ($lvm_pv =~ m/.*drbd.*/) {
+                $drbd_in_use = 1;
+                my $drbd_status = `drbd-overview | grep $scfg->{vgname} | grep "Primary/Primary"`;
+                if (length($drbd_status) == 0) {
+                    $drbd_online = 0;
+                }
+                }
+        }
+
         next if $scfg->{shared};
 
         $sharedvm = 0;
 
@@ -179,8 +193,15 @@
     if ($running && !$sharedvm) {
         die "can't do online migration - VM uses local disks\n";
     }
 
+    if ($running && $drbd_in_use && !$drbd_online) {
+        die "can't do migration - DRBD storage is not running Primary/Primary!\n";
+    }
+    if ($drbd_in_use && !$drbd_online) {
+        logmsg('warning', "Migrating VM - Check DRBD storage before starting VM. DRBD is not running Primary/Primary!\n");
+    }
+
     # do some checks first
     foreach my $volid (keys %$volhash) {
         my ($sid, $volname) = PVE::Storage::parse_volume_id ($volid);
         my $scfg =  PVE::Storage::storage_config ($qm->{storecfg}, $sid);
I'll try play around with the "after-sb-*pri"-parameter of DRBD. Maybe this situation can be avoided in most common cases.
 
Not really yet - I made after-sb-1pri and 2pri use violently-as0p. after-sb-0pri has been set to discard-zero-changes. It seems to resolve which one to drop nicely if you follow the scheme of only running active VMs on one server at normal operation - and still in case you had VMs running on both servers on the same DRBD volume it will fail with a split brain as it should. I noticed the violently-as0p has a warning in the documentation, but using it together with discard-zero-changes I cannot see why it should be dangerous to use?
Case is still not solved though, because both ends are marked as Primary. The parameter "rr-conflict call-pri-lost" seems to be needed along with a handler for pri-lost to put the loosing primary into secondary to make the synchronization possible. I still need to test this part.