DRBD initiating emergency reboot of node

May 5, 2010
44
0
6
California, USA
Hello,

I just finished setting up a 3 node cluster (2 server and 1 quorum disk). I have DRBD setup on both servers. When I test fail-over (unplugging server/stopping networking etc.) and shutdown the secondary server, everything works great. When I "fail" the master server, the secondary server shuts itself down as well. Eventually when it comes back up, I get an email saying:

Due to an emergency condition, DRBD is about to issue a reboot
of node forge. If this is unintended, please check
your DRBD configuration file (/etc/drbd.conf).

and...

DRBD has detected that the resource r0
on forge has lost access to its backing device,
and has also lost connection to its peer, ironworks.
This resource now no longer has access to valid data.

Which is cool, but why does it restart itself just because it lost connectivity to the "master"? Both servers "should" be primary. I am very new to DRBD, so forgive me if it's something simple.

pveversion:
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-73
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-73
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-30
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1

cat /etc/drbd.d/r0.res
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
net {
cram-hmac-alg sha1;
shared-secret "my-secret";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on forge {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.26:7788;
meta-disk internal;
}
on ironworks {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.24:7788;
meta-disk internal;
}
}

Let me know if you need any other info, thanks in advance!

--Will

**EDIT**

I re-created the resources (decided to go with the two resource suggestion on the wiki) and made sure that both servers showed primary/primary and the issue seems to have gone away.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!