HA and DRBD questions before building 4 way cluster

fjs

New Member
Jul 1, 2015
17
0
1
South Africa
Hi all

I am fairly new to prox, but I am an old hand at linux so I learn fast, and I wont waste your time. I have done some research on HA and DRBD and want to try it.
I have 4x dell R720 with 2x600GBsas (raid1) + 4x600GBsas (raid 10) - so 2 hds per server effectively. I want to use the first hd for the OS/prox and the second hd for the DRBD/HA/CTs.

Specs:
prox 3.4 latest
dells with 6x600Gb SAS 15 rpm in 2 raid sets as per above.

Here are my questions:
How to do a 4 way DRBD setup. I know I will lose much hd space, but can it be done without stacking etc?

Or do you rather suggest I go for sets of 2 ie 2x2? If this is recommended can I still have my 4 dells in one cluster? I assume you can but then HA will work in pairs eg dell 1&2 and dell 3&4?

Here are my goals.
HA failover to at least 1 server (sets of 2), but I prefer failover to all (4 servers)
A single frontend on say dell 1 where I control all the other dells ie just like a normal non-HA cluster.

Then lastly if I do 2 pairs of 2 servers, would i be able to login to the frontend of any of the 4 servers and migrate a CT from pair 1 to pair 2 (keep in mind their DRBDs are separate)

I just want simplicity and HA at the end. Simple answers to my questions are ok, I can research the rest, I just need some guidance. Thanks all.

Admin can you send me some information on the costing on commercial support? Thanks.
 
Thanks Tom. I did check it out and v4 looks amazing. Do you have an ETA on actual release date as I am strapped for time. I need a solution in 2 weeks basically. If I know the release date is soon I will delay the project rather. Thanks

I also looked @ the support pricing. I have 4x dells as you know with 2 cpus per dell. Thats total 8 cpus. Would the cost be "Proxmox VE Community Subscription 4 CPUs/year" x 2?


 
Thanks Tom. I did check it out and v4 looks amazing. Do you have an ETA on actual release date as I am strapped for time. I need a solution in 2 weeks basically. If I know the release date is soon I will delay the project rather. Thanks

4.0 stable will NOT be available in two weeks. Depending on the found bugs and overall stability, this take more time. We do not have fixed release dates, we release when its ready.

I also looked @ the support pricing. I have 4x dells as you know with 2 cpus per dell. Thats total 8 cpus. Would the cost be "Proxmox VE Community Subscription 4 CPUs/year" x 2?



if you have 4 servers, you need 4 subscription keys. in your case, 4 x 2-CPU keys.

if you go for community, its 4 x 119.80 per year.
 
Thanks Tom. Yes I know these bugs take time. Thought you might have a rough ETA eg November or so. Thanks for the info!
 
Small issue. I setup drdb on the 4 dells. 2x2 setup with drbdr0 on set1 and drbdr1 on set2. I made a typo with this command on set2:
# vgcreate drbdr0 /dev/drbd1
should have been
# vgcreate drbdr1 /dev/drbd1

I deleted the vg and added it again so vgscan is correct:
Reading all physical volumes. This may take a while...
Found volume group "pve" using metadata type lvm2
Found volume group "drbdr1" using metadata type lvm2

However in prox gui I can only see drbdr0 (which I did correctly) not drbdr1. Is this perhaps a bug or did I make a mistake? I already checked all the files I could.

Any help appreciated. Thanks
 
Not certain that I've the answer these may or not help:

did you already addthe storage to pve using the incorrect name? if so delete that storage from pve.

When you are adding do you see the incorrectly named vg? if so check if the vg exists on the node with ' vgs ' .
 
Small issue. I setup drdb on the 4 dells. 2x2 setup with drbdr0 on set1 and drbdr1 on set2. I made a typo with this command on set2:
# vgcreate drbdr0 /dev/drbd1
should have been
# vgcreate drbdr1 /dev/drbd1

I deleted the vg and added it again so vgscan is correct:
Reading all physical volumes. This may take a while...
Found volume group "pve" using metadata type lvm2
Found volume group "drbdr1" using metadata type lvm2

However in prox gui I can only see drbdr0 (which I did correctly) not drbdr1. Is this perhaps a bug or did I make a mistake? I already checked all the files I could.

Any help appreciated. Thanks

Hi,
looks that you forgot " vgcreate drbdr0 /dev/drbd0".

What is the output of
Code:
pvs
vgs

BTW. I don't know the dell server (and the used raid-controller), but perhaps you can configure the raidcontroller like an areka and use all SAS-disk (6) for two different raid-volumes?
One small for os (raid-5) and one (or two - for each drbd-resource one) in raid-10.
With 6 hdds you are much faster than with 4.

Udo
 
Thanks, already checked those. Even deleted all VG and DRBD stuff, also in prox gui, and re-added everything but yet the same. I think it is a bug somewhere. Either way both my DRBDs are working, they are just both called drbdr0 lol.

So I now have my cluster setup and it is working well. I would much rather prefer to use CTs than KVMs. Is there a way to HA CTs? In other words run CTs on shared storage / drbd or anything else or is the root issue hampering this?

I dont want to put all the HA stuff on KVM to utilize the drbd/lvm. thanks.
 
Thanks, already checked those. Even deleted all VG and DRBD stuff, also in prox gui, and re-added everything but yet the same. I think it is a bug somewhere. Either way both my DRBDs are working, they are just both called drbdr0 lol.
Hi,
that isn't possible! You can't have two resources r0 / VGs with the same name. Except you mean on both nodes!
This is the normal behavior.

The suggested way is to use two HDDs/partitions/raidvolumes (on both nodes) to create two resources (r0 + r1).

What is the output of
Code:
drbd-overview
Udo
 
Nope I did that.

root@jt1:~# pvs
PV VG Fmt Attr PSize PFree
/dev/drbd0 drbdr0 lvm2 a-- 1.09t 1.09t
/dev/sda3 pve lvm2 a-- 558.25g 16.00g

root@jt3:~# pvs
PV VG Fmt Attr PSize PFree
/dev/drbd1 drbdr1 lvm2 a-- 1.09t 1.09t
/dev/sda3 pve lvm2 a-- 558.25g 16.00g

root@jt1:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr0 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

root@jt3:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr1 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

Jt1 has partner jt2 and jt3 has partner jt4 hence I only show you jt1/3.

Regarding the raid suggestion I will try that. I dont need that much hd speed really. I will add SSDs later rather if needed.
 
Nope I did that.

root@jt1:~# pvs
PV VG Fmt Attr PSize PFree
/dev/drbd0 drbdr0 lvm2 a-- 1.09t 1.09t
/dev/sda3 pve lvm2 a-- 558.25g 16.00g

root@jt3:~# pvs
PV VG Fmt Attr PSize PFree
/dev/drbd1 drbdr1 lvm2 a-- 1.09t 1.09t
/dev/sda3 pve lvm2 a-- 558.25g 16.00g

root@jt1:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr0 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

root@jt3:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr1 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

Jt1 has partner jt2 and jt3 has partner jt4 hence I only show you jt1/3.

Regarding the raid suggestion I will try that. I dont need that much hd speed really. I will add SSDs later rather if needed.

Hi,
it would be very helpfull if you post the output from the commands (like drbd-overview) where I ask for...

This is an working real-live config:
Code:
root@prox-a:~# drbd-overview
  0:r0  Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: a_r0 499.98g 38.00g  
  1:r1  Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: b_r1 499.98g 144.00g

root@prox-a:~# vgs
  VG              #PV #LV #SN Attr   VSize   VFree  
  a_r0              1   2   0 wz--n- 499.98g 461.98g
  b_r1              1   5   0 wz--n- 499.98g 355.98g
  pve               1   3   0 wz--n- 185.76g  20.98g


root@prox-b:~# drbd-overview
  0:r0  Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: a_r0 499.98g 38.00g  
  1:r1  Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: b_r1 499.98g 144.00g 
root@prox-b:~# vgs
  VG              #PV #LV #SN Attr   VSize   VFree  
  a_r0              1   2   0 wz--n- 499.98g 461.98g
  b_r1              1   5   0 wz--n- 499.98g 355.98g
  pve               1   4   0 wz--n-  99.50g  12.00g

root@prox-b:~# cat /etc/pve/storage.cfg 
dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0

lvm: a_r0
        vgname a_r0
        shared
        content images
        nodes prox-b,prox-a

lvm: b_r1
        vgname b_r1
        shared
        content images
        nodes prox-b,prox-a

root@prox-b:~# pvs
  PV                          VG              Fmt  Attr PSize   PFree  
  /dev/drbd0                  a_r0            lvm2 a--  499.98g 461.98g
  /dev/drbd1                  b_r1            lvm2 a--  499.98g 355.98g
  /dev/sda2                   pve             lvm2 a--   99.50g  12.00g
Udo
 
By the same name I mean when you select the drbd to make the lvm in the gui they both came up with the same name eg drbdr0. So then I gave set1 the lvm name drbdr0 on drbd drbdr0 and I gave set2 the lvm name drbdr1 on drbd drbdr0 (this should be drbdr1).
After creation in the gui under the nodes all looks well and i can see dell 1 and 2 sharing drbdr0 and dell 3 and 4 sharing drbdr1, I hope that makes sense.

All is working fine though so I am happy. No need to continue on this.
 
By the same name I mean when you select the drbd to make the lvm in the gui they both came up with the same name eg drbdr0. So then I gave set1 the lvm name drbdr0 on drbd drbdr0 and I gave set2 the lvm name drbdr1 on drbd drbdr0 (this should be drbdr1).
After creation in the gui under the nodes all looks well and i can see dell 1 and 2 sharing drbdr0 and dell 3 and 4 sharing drbdr1, I hope that makes sense.

All is working fine though so I am happy. No need to continue on this.

Hi,
it's stongly recommended to use two resource between dell1+2 and two between dell3+4! In normal case dell1 sould use one (like r0) and dell2 the other (like r1).

Otherwise you will get trouble if you run in an split brain situation, because you can't simply overwrite the other side!

Udo
 
Sorry its been a very busy day and I forgot to post that. Here you go:

Set 1: jt1 and jt2 sharing drbdr0:
root@jt1:~# drbd-overview
0:r0 Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: drbdr0 1116.71g 0g
root@jt1:~#

Set 2: jt3 and jt4 sharing drbdr1:
root@jt3:~# drbd-overview
1:r1 Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: drbdr1 1116.71g 0g

All 4 servers are in one cluster ie i can see them all in the gui and server 1&2 has the one LVM and server 3&4 has the second LVM. I am not posting jt2 and 4's config as it is the same as 1 and 3.

root@jt1:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr0 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

root@jt3:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdr1 1 0 0 wz--n- 1.09t 1.09t
pve 1 3 0 wz--n- 558.25g 16.00g

root@jt1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir
maxfiles 0

lvm: drbdr0
vgname drbdr0
shared
content images
nodes jt2,jt1

lvm: drbdr1
vgname drbdr0
shared
content images
nodes jt4,jt3

root@jt3:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir
maxfiles 0

lvm: drbdr0
vgname drbdr0
shared
content images
nodes jt2,jt1

lvm: drbdr1
vgname drbdr0
shared
content images
nodes jt4,jt3


Thanks!
 
So I configured fencing for the first time last night. I configured cluster.conf wrongly, which I only found out when we had a power failure.
So the servers came back online with this broken file and now the cluster wont start because cman has an issue with the file.
I cant fix the file because it is in read only.

cluster.conf: where this is my problem "cman two_node="1"" and something wrong with my fence config.
<?xml version="1.0"?>
<cluster name="jtcluster" config_version="10">

<cman two_node="1" expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>

<fencedevices>
<fencedevices agent="fence_ipmilan" ipaddr="10.10.10.1" login="root" passwd="popi/.," name="fence0A" />
<fencedevices agent="fence_ipmilan" ipaddr="10.10.10.2" login="root" passwd="popi/.," name="fence0B" />
<fencedevices agent="fence_ipmilan" ipaddr="10.10.10.3" login="root" passwd="popi/.," name="fence1B" />
<fencedevices agent="fence_ipmilan" ipaddr="10.10.10.4" login="root" passwd="popi/.," name="fence1B" />
</fencedevices>

<clusternodes>
<clusternode name="jt1" votes="1" nodeid="1"/>
<fence>
<method name="1">
<device name="fence0A" action="reboot"/>
</method>
</fence>
<clusternode name="jt2" votes="1" nodeid="2"/>
<fence>
<method name="1">
<device name="fence0B" action="reboot"/>
</method>
</fence>
<clusternode name="jt3" votes="1" nodeid="3"/>
<fence>
<method name="1">
<device name="fence1A" action="reboot"/>
</method>
</fence>

<clusternode name="jt4" votes="1" nodeid="4"/>
<fence>
<method name="1">
<device name="fence1B" action="reboot"/>
</method>
</fence>

</clusternodes>

</cluster>


Online suggested something along this, but found it too late.
<?xml version="1.0"?>
<cluster name="peR620" config_version="28">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node1-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node2-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node3-drac" passwd="XXXX" secure="1"/>
</fencedevices>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node3-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>


I have learned about the right config now, but how do I get this thing online again. I spend the last 2 hours googling lol. Any pointers appreciated. At this pve services wont run nor will cman. So I cant do the expected 1 thing either. I know how to fix a normal cluster issue but my broken file is the problem here. Thanks all.
 
Maybe I should mount a live cd and copy the original cluster.conf file back and start over? Just an idea. Thanks again.
 
Thanks Dietmar. I tried that already:

root@jt3:~# pvecm expected 1
cman_tool: Cannot open connection to cman, is it running ?
root@jt3:~# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... tempfile:12: element fence: Relax-NG validity error : Expecting element clusternode, got fence
tempfile:11: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
Configuration fails to validate
two_node set but there are more than 2 nodes
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]

So all points back to cluster.conf, which I cant edit. Is there any way how I can edit the file in another way?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!