Hello,
I just finished setting up a 3 node cluster (2 server and 1 quorum disk). I have DRBD setup on both servers. When I test fail-over (unplugging server/stopping networking etc.) and shutdown the secondary server, everything works great. When I "fail" the master server, the secondary server shuts itself down as well. Eventually when it comes back up, I get an email saying:
Due to an emergency condition, DRBD is about to issue a reboot
of node forge. If this is unintended, please check
your DRBD configuration file (/etc/drbd.conf).
and...
DRBD has detected that the resource r0
on forge has lost access to its backing device,
and has also lost connection to its peer, ironworks.
This resource now no longer has access to valid data.
Which is cool, but why does it restart itself just because it lost connectivity to the "master"? Both servers "should" be primary. I am very new to DRBD, so forgive me if it's something simple.
pveversion:
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-73
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-73
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-30
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
cat /etc/drbd.d/r0.res
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
net {
cram-hmac-alg sha1;
shared-secret "my-secret";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on forge {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.26:7788;
meta-disk internal;
}
on ironworks {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.24:7788;
meta-disk internal;
}
}
Let me know if you need any other info, thanks in advance!
--Will
**EDIT**
I re-created the resources (decided to go with the two resource suggestion on the wiki) and made sure that both servers showed primary/primary and the issue seems to have gone away.
I just finished setting up a 3 node cluster (2 server and 1 quorum disk). I have DRBD setup on both servers. When I test fail-over (unplugging server/stopping networking etc.) and shutdown the secondary server, everything works great. When I "fail" the master server, the secondary server shuts itself down as well. Eventually when it comes back up, I get an email saying:
Due to an emergency condition, DRBD is about to issue a reboot
of node forge. If this is unintended, please check
your DRBD configuration file (/etc/drbd.conf).
and...
DRBD has detected that the resource r0
on forge has lost access to its backing device,
and has also lost connection to its peer, ironworks.
This resource now no longer has access to valid data.
Which is cool, but why does it restart itself just because it lost connectivity to the "master"? Both servers "should" be primary. I am very new to DRBD, so forgive me if it's something simple.
pveversion:
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-73
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-73
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-30
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
cat /etc/drbd.d/r0.res
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
net {
cram-hmac-alg sha1;
shared-secret "my-secret";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on forge {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.26:7788;
meta-disk internal;
}
on ironworks {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.1.24:7788;
meta-disk internal;
}
}
Let me know if you need any other info, thanks in advance!
--Will
**EDIT**
I re-created the resources (decided to go with the two resource suggestion on the wiki) and made sure that both servers showed primary/primary and the issue seems to have gone away.
Last edited: