DRBD on Proxmox cluster nodes ?

Belokan

Active Member
Apr 27, 2016
155
16
38
Hello all,

I'm using Proxmox@home for months now and I really like the product. Well designed, stable, etc, ...

My environment is based on 2 NUCs (pvetest repo) and 2 Synos.
Right now, each Syno provides NFS storage to the cluster and one of them is also providing a VM (phpvirtualbox) used as 3rd node for the HA quorum. I've defined a "physical" group with the 2 NUCs and HA VMs are bound to the group.

Yesterday I have had a power outage, longer than the UPS were able to backup. So everything went down properly (PVEs are upsmon clients and UPS are managed by the Synos). But I've forgot to set the "autostart after power failure" option on one Syno so the VMs hosted on it were "HA:error" when power came back.

My question is, regarding my environment, what could I do to set-up a redundant storage ?

I thought of:

1.- Create a 500GB iSCSI LUN on each Syno
2.- Attach (debian level) the Syno1 LUN to PVE1 as /dev/sdb
3.- Attach (debian level) the Syno2 LUN to PVE2 as /dev/sdb
4.- Create a single /dev/sdb1 partition on each LUN.
5.- Install drdb on PVE1/PVE2 to mirror the LUNs (active/active 2 nodes) and create a GFS fs over /dev/drbd1 metadevice
6.- Add /dev/drdb1 metadevice as a (shared?) directory in datacenter/storage
7.- Migrate (move disk) the VMs to the new storage.

What do you think ? Another solution maybe should be better ?

Thanks a lot in advance !

Olivier

PS: Regarding my power outage yesterday, even after the 2nd Syno restarted and storage flagged enabled:yes in datacenter, I was not able to start the HA VMs. I have had to remove them from HA control (one started just being set as disabled in HA), start and put them back under HA control. What is the correct way to restart such VMs after an outage ?
 
Last edited:
Hello,

I was up to point #5 (choosing the FS I'll use over DRBD) when I've found this doc: https://pve.proxmox.com/wiki/DRBD9
As I figured most of the required steps by myself (adding a dedicated NIC on my servers and NAS), I've then rolled back up to point #1 and followed the doc to implement DRBD !

So I now have a working DRBD on 2 nodes (the 3rd is just for quorum) based on the Wiki. The only difference is that it's based on iSCSI LUNs provided by 2 NAS (those VMs are for network services, I want them reliable/available not especially fast ...) and I've migrated 2 VMs (one HA and one "standard") to the new storage pool so far. As all my VMs are based on qcow2 with default cache option, I've modified the cache to write-through, stop, start and then move disk from NFS to DRBD (changing the type to RAW in the process).

My question is now, how could I move (if it is possible) the remaining VMs without stop/start them ? I can move them from NFS/qcow2 to DRBD/raw "live" but the cache option won't be write-through ... I've read the thread linked in the Wiki but can thay "live" with this option until next stop/start or is it possible to change the option during the disk move ? As soon as I change the cache option (it appears in red in the GUI) "move disk" is grayed out ...

Thanks in advance !
 
Continuing my monologue :)

I've finally took the time to stop/start the remaining VMs in order to change cache=write through and move the disks to DRBD/raw.

I now have 2 questions:

1.- How do you explain the fact having storage on DRBD makes the live migration faster ?

Before DRBD/raw (so NFS/qcow2) when live migrating from one node to the other took the time required to "copy" VM's RAM on my GB network. Now, it's about 4 times faster ...

2.- When I reboot a node, it "looks like" it's drbd physical device appears as "Diskless":

root@pve2:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk: Diskless
volume:1 disk: Diskless
pve1 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-105-disk-1 role:Secondary
disk:UpToDate
pve1 role: Primary
peer-disk:UpToDate
[...]


But no problem at LVM level:

root@pve2:~# vgs
VG #PV #LV #SN Attr VSize VFree
drbdpool 1 9 0 wz--n- 320.00g 772.00m
pve 1 3 0 wz--n- 55.77g 6.87g
root@pve2:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
.drbdctrl_0 drbdpool -wi-a----- 4.00m
.drbdctrl_1 drbdpool -wi-a----- 4.00m
drbdthinpool drbdpool twi-aotz-- 319.00g 50.17 26.05
lvol0 drbdpool -wi-a----- 80.00m
vm-105-disk-1_00 drbdpool Vwi-aotz-- 32.01g drbdthinpool 100.00
vm-107-disk-1_00 drbdpool Vwi-aotz-- 32.01g drbdthinpool 100.00
vm-110-disk-1_00 drbdpool Vwi-aotz-- 32.01g drbdthinpool 100.00
vm-111-disk-1_00 drbdpool Vwi-aotz-- 32.01g drbdthinpool 100.00
vm-112-disk-1_00 drbdpool Vwi-aotz-- 32.01g drbdthinpool 100.00
data pve -wi-ao---- 28.28g
root pve -wi-ao---- 13.75g
swap pve -wi-ao---- 6.88g


After an adjust, display is correct:

root@pve2:~# drbdadm adjust all
root@pve2:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
pve1 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-105-disk-1 role:Secondary
disk:UpToDate
pve1 role: Primary
peer-disk:UpToDate


Any clue ? Thanks !
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!