Two node active/passive cluster with DRBD.Is it fencing necessary?

acidrop

Renowned Member
Jul 17, 2012
204
7
83
Hello,

I have a two node cluster setup with two drbd resources in active/passive mode.This is a test setup.
The first drbd resource (r0) is primary at node A and the second (r1) is primary on node B.
I have created two different vm, one on each node, located at drbd r0 and r1 (lvm) respectively.

The problem is that if I poweroff for example node A and promote r0 as primary on node B and try to manually move vm conf files from node A to node B by giving:

Code:
mv /etc/pve/nodes/proxmox1/qemu-server/100.conf /etc/pve/nodes/proxmox2/qemu-server/

and I get: mv:
Code:
cannot move `100.conf' to `/etc/pve/nodes/proxmox2/qemu-server/100.conf': Device or resource busy

I noticed also that rgmanager is not running on the nodes even if I restart the service.
Also on syslog I get:

Code:
[FONT=arial][COLOR=#000000]Feb  4 13:22:53 proxmox2 pmxcfs[1364]: [status] crit: cpg_send_message failed: 9[/COLOR][/FONT]

Code:
root@proxmox1:~# pvecm nNode  Sts   Inc   Joined               Name
   1   M   3744   2013-02-04 12:55:24  proxmox1
   2   M   3744   2013-02-04 12:55:24  proxmox2
root@proxmox1:~# clustat
Cluster Status for pvecluster1 @ Mon Feb  4 13:34:39 2013
Member Status: Quorate


 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 proxmox1                                                            1 Online, Local
 proxmox2                                                            2 Online

Code:
root@proxmox1:~# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-40
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-7
ksm-control-daemon: 1.1-1

My cluster.conf :

Code:
<?xml version="1.0"?><cluster config_version="5" name="pvecluster1">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
    <clusternodes>
    <clusternode name="proxmox1" nodeid="1" votes="1">
    </clusternode>
    <clusternode name="proxmox2" nodeid="2" votes="1"/>
  </clusternodes>
  <rm/>
</cluster>

Is it necessary to setup fencing in this scenario?

Thank you

 
I doubt that active/passive can be configured. our HA stack cannot manage DRBD active/passiv.
 
Actually I don't want vm to be managed by HA automatically. I just want if a node fails to be able to move the vm located on it's drbd resource manually to the other node.
Is it possible?
 
Actually I don't want vm to be managed by HA automatically. I just want if a node fails to be able to move the vm located on it's drbd resource manually to the other node.
Is it possible?
Yes, if you reach quorum again.
Whitout quorum /etc/pve is write protected - so you can't move config-files.
With "pvecm expected 1" you should be able to move the configs also on the remaining node.

But why not active/active? You will have much more fun with live--migration... Normaly live migration is used often instead of recover an failing node!

Udo
 
Yes, if you reach quorum again.
Whitout quorum /etc/pve is write protected - so you can't move config-files.
With "pvecm expected 1" you should be able to move the configs also on the remaining node.

Thank you! that did the trick...

But why not active/active? You will have much more fun with live--migration... Normaly live migration is used often instead of recover an failing node!

Udo

I don't have a fencing mechanism for now so i do not want to risk data loss.I have tried also active/active and it's fun though :)
Am I safe with active/passive without fencing or still there is a possibility to loose data?
 
Thank you! that did the trick...



I don't have a fencing mechanism for now so i do not want to risk data loss.I have tried also active/active and it's fun though :)
Am I safe with active/passive without fencing or still there is a possibility to loose data?

Hi,
don't forget to change expected to 2 again after resolv the failed node.

Without quorum-disk fencing will not work with an 2-node cluster - because the first thing is quorum-lost.

You need manual doing, and if you know what you doing, there is with active/active not an higher risk than with active/passive.
Risky is to work with two nodes and expected nodes = 1!! In this case you can wrote to the same data from both machines (if one is not realy dead and you (or ha) start the VM on the other node).

Udo