Corosync 2Nodes + QDev // Backup network design

fr1000

Renowned Member
Mar 4, 2014
21
0
66
Hello together,

I'd like to build a PVE HA Cluster out of 2 PVE Nodes and 1 QDev to get quorum.
In order to get a nice and stable Corosync Link, I've a dedicated 1G NIC via Crossover LAN between the 2 PVE nodes.
The QDev VM is a external hosted system and can't be connected via LAN.

The Plan:
* Using Link0 (Primary) for Corosync between 2PVE nodes and 1 QDev over WAN (WireGuard VPN Mesh)
* Using Link1 (Backup) for Corosync between 2PVE nodes over 1G LAN (directly connected)

That way I'd have a 3 node corosync cluster and if the WAN Interface gets accidentally too busy (BWLimits in place, but DDoS, etc..), it will temporarily switch to the Backup link.

This would be the corosync.conf sample, I've come up with:

Code:
totem {
    version: 2
    secauth: on
    cluster_name: cluster-prod
    transport: udpu

    # Hauptverbindung (QDev, PV1, PV2)
    interface {
        ringnumber: 0
        bindnetaddr: <Hauptverbindungsnetzwerk>
        mcastport: 5405
        ttl: 1
    }

    # Backup-Verbindung (nur PV1 und PV2)
    interface {
        ringnumber: 1
        bindnetaddr: <Backupverbindungsnetzwerk>
        mcastport: 5405
        ttl: 1
    }
}

nodelist {
    node {
        ring0_addr: <IP-Adresse-PV1-Hauptverbindung>
        ring1_addr: <IP-Adresse-PV1-Backupverbindung>
        nodeid: 1
    }
    node {
        ring0_addr: <IP-Adresse-PV2-Hauptverbindung>
        ring1_addr: <IP-Adresse-PV2-Backupverbindung>
        nodeid: 2
    }
    node {
        ring0_addr: <IP-Adresse-QDev-Hauptverbindung>
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
    device {
        model: net
        votes: 1
        net {
            host: <IP-Adresse-QDev-Hauptverbindung>
            algorithm: ffsplit
        }
    }
}



The question:
* Is this (Link0 3 Nodes, Link1 2 Nodes) setup supported by Corosync?
* Alternativ, I could span an additional WireGuard Mesh VPN between all three nodes, but this time, I'd use the 1G NIC IPs as Endpoints for the WireGuard connection between the 2 Nodes. This way I could get all three nodes into the same subnet.
* Would you recommend to use the 1G NiC as Link1 or as Link0 in that case?

I hope it is understandable what I like to accomplish. :)

Thanks in advance!

Kind regards,
fr1000
 
Hey, thanks for the reply.

I don't think I have an issue with latency out-of-the-box. All servers (Node1,2 and QDev) are within the range of 1-4ms (over WireGuard).

Here is a better explanation of my scribbled corosync config idea:

Code:
totem {
    version: 2
    secauth: on
    cluster_name: my_cluster
    transport: udpu

    # main connection (QDev, PV1, PV2)
    interface {
        ringnumber: 0
        bindnetaddr: 10.0.0.0
        mcastport: 5405
        ttl: 1
    }

    # backup connection (only PV1 and PV2)
    interface {
        ringnumber: 1
        bindnetaddr: 10.0.1.0
        mcastport: 5405
        ttl: 1
    }
}

nodelist {
    node {
        ring0_addr: 10.0.0.1
        ring1_addr: 10.0.1.1
        nodeid: 1
    }
    node {
        ring0_addr: 10.0.0.2
        ring1_addr: 10.0.1.2
        nodeid: 2
    }
    node {
        ring0_addr: 10.0.0.3
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
    device {
        model: net
        votes: 1
        net {
            host: 10.0.0.3
            algorithm: ffsplit
        }
    }
}

Explanation of the Configuration​

Totem Section:

  • ringnumber: 0: This is the primary connection, which uses the 10.0.0.0/24 network.
  • ringnumber: 1: This is the backup connection, which uses the 10.0.1.0/24 network.
Nodelist Section:
  • PV1:
    • ring0_addr: 10.0.0.1: IP address of PV1 in the primary connection network. (WAN over WireGuard)
    • ring1_addr: 10.0.1.1: IP address of PV1 in the backup connection network. (LAN via direct connected 1G NIC)
  • PV2:
    • ring0_addr: 10.0.0.2: IP address of PV2 in the primary connection network. (WAN over WireGuard)
    • ring1_addr: 10.0.1.2: IP address of PV2 in the backup connection network. (LAN via direct connected 1G NIC)
  • QDev:
    • ring0_addr: 10.0.0.3: IP address of QDev in the primary connection network. QDev does not have a backup connection. (WAN over WireGuard)

It's a network / cluster design for a very small HA setup within a datacenter. So I do not have any options to build more LAN connections to get it work. I don't have a chance to get a switch for the 3 node setup of the network.

I hope that helps to understand my current headache topic a litte bit better. ;-)

Thanks in advance!

Kind regards,
fr1000
 
If latency is not an issue, it should work, but using it over a VPN is not a recommended option because if the primary network fails, no further node should fail because, as you are probably aware, it would lead to a complete system failure.
 
If latency is not an issue, it should work, but using it over a VPN is not a recommended option because if the primary network fails, no further node should fail because, as you are probably aware, it would lead to a complete system failure.
Yea, I know that this is a little tricky, when the QDev get lost. Within the described case, I just want to make sure that the cluster won't break, in case the WAN interfaces are getting too busy. So you would say that the uneven node count within the corosync.conf won't be a problem to get handled by corosync?

If that's the case, I'll use the LAN connection without VPN on top, in order to get the most stable link between the two PVE Hosts.

I know that it will be risky, if the QDev get lost AND one of the two hosts are going down within the same timeframe. However, with the resources available to me, I have to present the best possible HA that is feasible ;-) .. and it's way better than just not using a cluster :D

Thanks again!
 
I don't think that this is going to be a problem, but you could just test it before using it
 
I don't think that this is going to be a problem, but you could just test it before using it
unfortunately it's like so often... I don't have the resources to set up a testlab in advance... and one of the two servers is productive... so only the prod environment remains here ;D ... I know... not ideal... but that's why I wanted to inform myself beforehand, before the whole thing backfires ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!