DRBD9, drbdmanage and full mesh storage network

resoli · Mar 24, 2016

Hello,

I would like to create a pve4 three-node setup with drbd9 storage. I have read the related proxmox wiki article, and the documentation about drbd9 on the Linbit website.

If I understand well, the new drbdmanage tool is a way to simplfy drbd configuration, and involves the creation of the drbdctrl resource.

It happens that in my setup i can dedicate two phisical interfaces per node to the storage network, and possibly create a "full mesh" network as of Chapter 5.1.4 of DRBD9 manual (http://drbd.linbit.com/doc/users-guide-90/ch-admin-manual#s-drbdconf-conns). I understand that this network topology is currently not supported by drbdmanage, and I'm asking if it would be possible to configure the three storage nodes as usual (one ip address per node) and change the configuration of the network connections afterwards.

Any advice/suggestion?

Thanks,
rob

wolfgang · Mar 24, 2016

Hi,

why should this not work.

You have set the route manuell so you can use on 2 nics the same IP.

I never tried it with DRBD but with ceph it works this way.

resoli · Mar 24, 2016

Ok, I have not explained clearly my intention: I want to avoid to use a dedicated switch to connect the six interfaces, providing a phisical path (cable) for every connection in the mesh (three cables). I don't know if this fits in your scenario, but on the Chapter 5.1.4 of DRBD9 docs they clearly state that a different IP for every interface is needed.

By the way, do you mean by "set the route manuell" that I can set on every host rows in the routing table to direct packets through the specific interfaces?

Thanks,
rob

resoli · Mar 24, 2016

resoli said:
By the way, do you mean by "set the route manuell" that I can set on every host rows in the routing table to direct packets through the specific interfaces?

I have to think more on this, but probably it's both matter to remove automatic network routing and add some explicit one.

Will try ...

Thanks,
rob

wolfgang · Mar 24, 2016

resoli said:
By the way, do you mean by "set the route manuell" that I can set on every host rows in the routing table to direct packets through the specific interfaces?

correct you can use the same ip more nics but then you have to set the routing.

resoli · Mar 24, 2016

Yes, i tried for instance:

1) Add ip to both eth0 and eth1 (removing the netmask, this implies /32 and suppress creation of network routing)
ip addr add 192.168.0.1 dev eth2
ip addr add 192.168.0.1 dev eth3

2) Add explicit routes to 192.168.0.2, 192.168.0.3 (for instance):
ip route add 192.168.0.2 via 192.168.0.1 dev eth2
ip route add 192.168.0.3 via 192.168.0.1 dev eth3

wolfgang · Mar 25, 2016

Here is an example

auto eth1
iface eth1 inet static
address 10.10.10.5
netmask 255.255.255.255
post-up route add 10.10.10.4 dev eth1

auto eth2
iface eth2 inet static
address 10.10.10.5
netmask 255.255.255.255
post-up route add 10.10.10.6 dev eth2

resoli · Mar 25, 2016

Yes, I've done more or less the same:

=== node "uno" (10.1.1.1) ===
auto eth1
iface eth1 inet static
address 10.1.1.1
netmask 255.255.255.255
up ip route add 10.1.1.2 via 10.1.1.1 dev eth1

auto eth2
iface eth2 inet static
address 10.1.1.1
netmask 255.255.255.255
up ip route add 10.1.1.3 via 10.1.1.1 dev eth2

=== node "due" (10.1.1.2) ===
auto eth1
iface eth1 inet static
address 10.1.1.2
netmask 255.255.255.255
up ip route add 10.1.1.3 via 10.1.1.2 dev eth1

auto eth2
iface eth2 inet static
address 10.1.1.2
netmask 255.255.255.255
up ip route add 10.1.1.1 via 10.1.1.2 dev eth2

=== node "tre" (10.1.1.3) ===
auto eth1
iface eth1 inet static
address 10.1.1.3
netmask 255.255.255.255
up ip route add 10.1.1.2 via 10.1.1.3 dev eth1

auto eth2
iface eth2 inet static
address 10.1.1.3
netmask 255.255.255.255
up ip route add 10.1.1.1 via 10.1.1.3 dev eth2

---

I've tried to setup drbd9 on top of this, and it works well.

Many tanks,
rob

resoli · Apr 26, 2016

Update: it turns out that it doesn't work so well: I had a lot of strange connection problems between drbd9 cluster nodes, and i ended up to connect them via a dedicated switch (no more interface-to-interface dedicated connection).

My question reasoning about connection problems is: "To what interface/address is drbd kernel thread binding when listens for connections from the other nodes?"

Two interfaces, same IP, same port ...

rob

resoli · Aug 22, 2016

I succedded in setup a working full-mesh; the right suggestion came from drbd-user mailing list: use bridging instaead of routing. In this post I give the details:

https://lists.gt.net/drbd/users/28251#28251

bye,
rob

metaplop · Aug 22, 2016

Hello,

are your 6 interfaces connected on the same switch ? If so why don't you use bonding with LACP (instead of bridging) to have failover and load-balancing ?

resoli · Aug 22, 2016

metaplop said:
Hello,
are your 6 interfaces connected on the same switch ?

No switch at all; see former messages:

resoli said:
...I want to avoid to use a dedicated switch to connect the six interfaces, providing a phisical path (cable) for every connection in the mesh (three cables).

bye,
rob

metaplop · Aug 22, 2016

Sorry I misread it... Great to know that bridge is working with direct attachment and same ip.

flexyz · Jun 2, 2017

resoli said:
I succedded in setup a working full-mesh; the right suggestion came from drbd-user mailing list: use bridging instaead of routing. In this post I give the details:

http://lists.linbit.com/pipermail/drbd-user/2016-August/023187.html

bye,
rob

Hi

Can you post the details here or maybe you have a link that works?

Thanks

fortechitsolutions · Jun 2, 2017

Hi! For what it is worth. Not to be a negative - speaking / un-positive sad-faced person. But, maybe my lesson will be of slight use, or possibly elicit a counter-story from another posting person.

I spent some time last year doing a test config build on 4.X.Latest - the work spanned a couple of months - small test cluster of 3 nodes with 10gig inter-node connectivity for private/storage replication. DRBD storage in the 'new model'. Things were 'pretty cool'. Except, that after kicking the wheels and doing simulated fail testing, DRBD became either a pain in the fanny, or a nightmare, depending on your philosophy.

simulated fails in my case meant, on the remote IPMI admin for a node, I do a hard 'power off' command on one of the 3 pxoxmox cluster node members.

Then wait 5 minutes, and power him back on.

Then observe how everything behaves

Broadly speaking, it was too easy for me to get DRBD into a state which was non-trivial to recover from. It became problematic to try to sort things out, despite trying to follow what I thought were 'fairly standard' assessment and recovery paths.

Arguably, I was doing a zero-support-non-paying-for-DRBD_support-etc environment, and arguably, I was not greatly experienced with DRBD. But. After a certain level of testing, it was clear to me that - the failover scenarios were not sufficiently resilient for ease of use requirements for the given project. Possibly this has changed since last year; I am not positive as I haven't looked again since that test.

What I ended up using for that client, was a tremendously unsexy, but painfully simple and robust implementation,
-- his service provider offered a HA_NFS_Filer service (I assume it is ZFS-based under the hood on custom boxes they manage on your behalf; but end of the day - it is a classic HA_NFS Multi-head-data-replication-heartbeat-IP-failover-sort-of-thing).
-- so we just subscribed to a few buckets of ~500-1000's of gigs of HA_NFS service, "storage as a service"
-- then used those as shared NFS storage on proxmox
and it worked. Stupidly easy. Very resilient. No pain. And 100% un-sexy and zero interesting-wow-cool

but hey, sometimes things working 24/7/365 is - its own special kind of wow-cool.

Tim

flexyz · Jun 3, 2017

Hi Tim,

Very interesting I am planning to do a similar setup did not know it could be so tricky, did you connect the nodes interconnected without a switch? - and could issues have been network related? - could be worth a test using a switch

Thanks
Felix

fortechitsolutions · Jun 3, 2017

Hi, the hardware was a remote site hosting provider (OVH) so I can't answer with 100% confidence, but based on my understanding of their environment, it was (a) supermicro nodes (b) 2 x 1gig interfaces cabled to 1gig switches, is where public internet routing came in and (c) 2 x 10gig interfaces cabled to ports on 10gig switch. Then using the OVH management layer it was necessary to designate a "VRack" - private network space basically - and designate my hosts 10gig interfaces to be members of this private network. Once that was in place, I had full 10gig connectivity between the configured 10gig interfaces / multicast / full private network access. I can't think they implement it at a hardware level any way other than just having hard-wired 10gig ports to 10gig switches physically; with their own management layer to help define and designate port memberships to the so-called "VRACKs"

I'm very confident the problems were not 'network related' - OVH seem to do a solid job of providing hardware that meets the specs they designate. And I had no issues with network traffic on the private VRACK (ie, I did a different test setup on the same hardware, where I used 5 nodes in total -- 2 of them acted as a self-installed HA_NFS_Filer appliance pair; exporting their NFS via the 10gig VRACK network; then 3 of the nodes were stock proxmox hosts, as a 3-node proxmox cluster // using the NFS mount storage from the HA_NFS_Filer -- and it worked fine. But in the end, me doing my own NFS Filer this way was far more drama than just buying HA_NFS_Filer service from OVH and using that "NFS as a service" -- so that was the end solution.

But definitely, there was no ambiguity in my tests, about my lack of trust on the underlying hardware.

My memory is vague, but I recall that genereally, after my similated fail of one node in the DRBD backed storage cluster;

consider scenario where there are 9 x VMs in total, 3 per physical host
DRBD replicates all VM storage LUNs across all proxmox nodes
Proxmox nodes have each ~2Tb HW Raid used as the "DRBD storage tank"
so total DRBD storage capacity for cluster is ~2Tb usable.

then when things operating smoothly, for example as a starting point
Proxmox1 has active role on DRBD for VMS 1,2,3
Proxmox2 has active role for DRBD for VMS4,5,6
Proxmox3 has active role for DRBD for VMs 7,8,9
and all proxmox hosts 1,2,3 have all 9 VM data stored on their "local disks (ie, which is replicated among all via DRBD)"

and what I would see after a fail test, for example killing proxmox2 and then rebooting him 5 minutes later

Proxmox1,3 would be in accordance with the view that Proxmox2 had fallen off the map
But then once Proxmox2 came back, it might have inconsistent view of who was actively master on some of the VMs
and there appeared to be no way I could resolve this.

DRBD was pretty verbose in the logs about when it was happy / when it wasn't happy / when it was resyncing data.

And certainly it was never unclear, in some cases it did properly behave and resync things / and everything was happy after a while

but in some cases (ie, 1 of 3 tests?) it would never get the states properly allocated again, and they would remain out of agreement about who was actually primary or not for various VMs. and I could not force an update; sync, and even trying to delete VM and then reprovision, it was 'stuck/broken'.

Anyway. To be honest it was not hard to setup - if you spend a few hours at it you will have a 3-node DRBD cluster up and running. Then another few hours, you can have a few test VMs up and then do a few cycles of "Live Migrate" and then "Fail testing" and see how it goes. But - it was just not workable for me last year. At first when I asked via Forums was politely told, very accurately by the Proxmox DevPerson, "DRBD is a preview release feature at this stage" (aka not production ready) and was not formally recommended for production use, only testing-for-fun.

I hope this helps a tiny bit to clarify what my test environemnt was?

Tim

Search

Search

DRBD9, drbdmanage and full mesh storage network

resoli

Renowned Member

wolfgang

Proxmox Retired Staff

resoli

Renowned Member

resoli

Renowned Member

wolfgang

Proxmox Retired Staff

resoli

Renowned Member

wolfgang

Proxmox Retired Staff

resoli

Renowned Member

resoli

Renowned Member

resoli

Renowned Member

metaplop

New Member

resoli

Renowned Member

metaplop

New Member

flexyz

Well-Known Member

fortechitsolutions

Renowned Member

flexyz

Well-Known Member

fortechitsolutions

Renowned Member