DRBD on VE 2.0 - disaster recovery

  • Thread starter Thread starter cogarg
  • Start date Start date
C

cogarg

Guest
Hello, i've setup a two-node cluster with the latest proxmox ve 2.0, following the guidelines in http://pve.proxmox.com/wiki/DRBD and http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster and after some tries, it worked like a charm, until i decided to reinstall and test the exact same scenario, on the same servers/disks.

System installation/configuration went fine, until i decided to bring up the (unmodified) drbd devices:
Code:
root@prox1:~# /etc/init.d/drbd start
Starting DRBD resources:[ d(r0) 0: Failure: (104) Can not open backing device.

[r0] cmd /sbin/drbdsetup 0 disk /dev/cciss/c0d1p1 /dev/cciss/c0d1p1 internal --set-defaults --create-device  failed - continuing!
 
d(r1) s(r0) s(r1) n(r0) n(r1) ].....0: State change failed: (-2) Need access to UpToDate data
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Need access to UpToDate data
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Need access to UpToDate data
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Need access to UpToDate data
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Need access to UpToDate data
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17

my drbd.conf
Code:
global { usage-count no; }
common { syncer { rate 30M; } }
resource r0 {
        protocol C;
        startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "my-secret";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on prox1 {
                device /dev/drbd0;
                disk /dev/cciss/c0d1p1;
                address 10.10.10.10:7788;
                meta-disk internal;
        }
        on prox2 {
                device /dev/drbd0;
                disk /dev/cciss/c0d1p1;
                address 10.10.10.20:7788;
                meta-disk internal;
        }
}
resource r1 {
        protocol C;
        startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "my-secret";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on prox1 {
                device /dev/drbd1;
                disk /dev/cciss/c0d1p2;
                address 10.10.10.10:7789;
                meta-disk internal;
        }
        on prox2 {
                device /dev/drbd1;
                disk /dev/cciss/c0d1p2;
                address 10.10.10.20:7789;
                meta-disk internal;
        }
}

Note that r1 worked fine, but it only had a VG on it and no LVs, while r0 had two LVs on it.

Does anyone know why this happens, or how i can get drbd to work with a disk from an 'older' installation?

I know this is a pretty weird scenario, but in a case where disaster recovery is required, it could reduce the downtime by a pretty big amount of time.
 
Seems that there was something wrong with r0 on prox1, plus some lvm+drbd issue that prevented drbd from accessing the metadata, which the following commands 'fixed':
prox1:~# LVM_SYSTEM_DIR= drbdadm create-md r0
prox2:~# drbdadm invalidate-remote r0

Some more drbdadm online/up/attach/detach commands later, r0 started syncing from prox2 to prox1, and another 'problem' presented itself...

...is there a way to create a new VM and (re)use an existing LV?