[TUTORIAL] Adding a secondary corosync network for failover - setting the priority when adding a 2nd corosync network to upgrade to a redundant ring network.

hackinthebox · Nov 14, 2021

SCROLL TO THE END FOR AN EXAMPLE
ADD knet_link_priority: <value> TO YOUR /etc/pve/corosync.conf FILE UNDER THE TOTEM DIRECTIVE AND EACH RESPECTIVE INTERFACE SUBDIRECTIVE

Here is the guide on creating a separate cluster network:
https://pve.proxmox.com/wiki/Separate_Cluster_Network

MY OBJECTIVE
These are some of my experiences migrating from my first test cluster to a meaner, cleaner, much more reliable one. Among many things, the 3 primary upgrades I made were A) going from a 2-node to a 3-node cluster for a proper quorum for high availability, B) local storage on separate physical volumes to support ZFS-zsync for "mission critical" guests that can't depend on network storage, and C) a separate corosync network as recommended in the documentation.

MY HARDWARE
My nodes are just old Dell Optiplex 7010 SFF's. They each take 2xSSD's, a 10-gig network PCIE card (in addition to the onboard 1Gbe interface), with an additional slot to spare for either a low-profile GPU, additional network interfaces, or additional solid state storage. I'm using the onboard 1Gbe interface (eno1) for the corosync network 192.168.255.0/24 and using 3 ports on a managed switch as an untagged VLAN to connect everything. I'm happy with this because it doesn't need to get out to the internet, however, I went ahead and made sure I completed the VLAN config on my switches and router so I'd have out-of-band SSH access, can send Wake-on-lan packets if necessary, and still have access for monitoring. The 10gig interface on each node connects to my primary LAN with access to shared storage.

MY INITIAL CONFIG

Cluster name: pve-cluster-a

10gig interface IP's & hostnames

Code:

192.168.4.111/24  pve-a
192.168.4.112/24  pve-b
192.168.4.113/24  pve-c

1Gbe corosync IP's & hostnames

Code:

192.168.255.11/24  pve-sync-a
192.168.255.12/24  pve-sync-b
192.168.255.13/24  pve-sync-c

JOINING MACHINES TO MY CLUSTER
To do this, I logged in to each node via SSH and entered the following commands:
(from 192.168.4.111)

Code:

pvecm pve-cluster-a --link0 192.168.255.11 --nodeid 1

(from 192.168.4.112)

Code:

pvecm add 192.168.4.111 --link0 192.168.255.12 --nodeid 2

(from 192.168.4.113)

Code:

pvecm add 192.168.4.111 --link0 192.168.255.13 --nodeid 3

STATUS CHECK
This resulted in a 3-node cluster with a separate corosync network. Of course, there was additional node network configuration necessary first, but that goes beyond the scope of this post. Feel free to ping me and I'll help any newbies out. Running the following command produced this output:

Code:

root@pve-a:/etc/pve# pvecm status
Cluster information
-------------------
Name:             pve-cluster-a
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Nov 14 09:55:16 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.167
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.255.11 (local)
0x00000002          1 192.168.255.12
0x00000003          1 192.168.255.13

UPDATING HOSTS FILE WITH UNIQUE HOSTNAMES FOR EACH STATIC IP INTERFACE
I updated /etc/hosts with the following IPv4 mappings for good measure:
This is likely unnecessary as PVE stores IP addresses in config files instead of hostnames for stability and performance and won't often need to resolve them anyway. This is more of the admin giving his creations names.

Code:

127.0.0.1 localhost.localdomain localhost
192.168.4.111 pve-a.mydomain.com pve-a
192.168.4.112 pve-b.mydomain.com pve-b
192.168.4.113 pve-c.mydomain.com pve-c
192.168.255.11 pve-sync-a.mydomain.com pve-sync-a
192.168.255.12 pve-sync-b.mydomain.com pve-sync-b
192.168.255.13 pve-sync-c.mydomain.com pve-sync-c

FORCE-GENERATE CERTIFICATES THAT ARE RANDOMIZED WITH ENTROPY
And then of course, run

Code:

pvecm updatecerts --force

I'm extra cautious and like a fresh pair of certificates for each node after any modification to cluster membership.

RESTRICT MIGRATION ON THE SYNC NETWORK
All of this config for a separate corosync network could be pointless if Proxmox decides to clog up your sync network with guest migrations between nodes!! It's important to tell the cluster which network it's supposed to use for these tasks. To tune this behavior, I edit the /etc/pve/datacenter.cfg file and add this line:
migration: secure,network=192.168.4.0/24 This ensures the sync network isn't saturated, and that the cluster takes full advantage of the 10gig network.

WHY IS THIS CONFIGURATION WRONG?
I was happy with this configuration for a few weeks, but I was generally anxious about the stability of the separate corosync network. Adding additional critical components to a system creates additional points of potential failure.

And that's exactly what happened.

While configuring a firewall rule on my network to allow Wake-on-lan packets to pass into the 192.168.255.0/24 network, I was having trouble determining exactly why my firewall was still blocking packets, even though I had added a rule exception directly from the logs. I wanted to eliminate the VLAN configuration on the switch as the possible problem, and while I was in there, (like a dummy), I mistakenly disabled the management VLAN, locking me out of the switch altogether until I did a reboot to revert the unsaved changes. Even SSH and Telnet were inaccessible.

FRAGILE CONFIGURATION
The moment the switch used for corosync was powered off, I instantly saw all 3 nodes in the web console change to nasty icons as every guest, container, and node depended on it. I cringed and my heart sunk.

I already spent my budget for upgrades, and I'm using the spare PCIe slot for GPU's. Adding network ports on each node for a redundant ring network for corosync just isn't in my future. I thought my only choice was to accept it as it is. But I found there's a great solution. I've been reading the docs intensely for several weeks in preparation for the migration, and I don't know I missed this:

EUREKA! SURPRISED THERE ISN'T MORE EMPHASIS ON THIS!
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy

Since lower priority links will not see traffic unless all higher priorities have failed, it becomes a useful strategy to specify even networks used for other tasks (VMs, storage, etc…) as low-priority links. If worst comes to worst, a higher-latency or more congested connection might be better than no connection at all.

Bingo! Update /etc/pve/corosync.confand add a 2nd link to the system! Problem solved! If the 1Gbe switch ever goes down, the corosync network can failover to the existing 10gig network on a different switch!

Or not.

GOOD LUCK FIGURING OUT HOW, THOUGH
For this to work, each link should be assigned a priority. The higher the value, the higher the priority. The documentation explains how to initiate this very well, but no where in the proxmox documentation is information on how to specify priority values to an existing configuration where none exist! When creating a single sync network, priorities are irrelevant. The generated configuration files simply do not show this parameter. NOTE: In my particular situation, the lower ring id (ring0) should get the first priority by default, but I don't want to rely on a default setting that doesn't even show up in the config for something pretty important.

I'm citing additional documentation here about corosync from debian.org. This was a very long search for me.
https://manpages.debian.org/bullseye/corosync/corosync.conf.5.en.html

THE SOLUTION
You need to add

Code:

knet_link_priority: <value>

to your /etc/pve/corosync.conf file under the TOTEM > INTERFACE parameters and assign your priority to each interface. Here's my finished .conf file as an example:

FULL CONFIGURATION EXAMPLE

Code:

root@pve-a:/etc/pve# cat corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve-a
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.255.11
    ring1_addr: 192.168.4.111
  }
  node {
    name: pve-b
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.255.12
    ring1_addr: 192.168.4.112
  }
  node {
    name: pve-c
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.255.13
    ring1_addr: 192.168.4.113
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-cluster-a
  config_version: 4
  interface {
    linknumber: 0
    knet_link_priority: 255
  }
  interface {
    linknumber: 1
    knet_link_priority: 4
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

The values you assign to the parameter is arbitrary, and only relative to other assigned values. (Don't be cute and try a floating point number though). I used 255 and 4 as my values simply because they are consistent with the 3rd octet of the network addresses they represent. If you wanted to swap the primary corosync ring with the secondary, simply swap the values for knet_link_priority: Any time you modify this conf file, remember to increment the value for the config_version: as well, per documentation.

This method tested and worked perfectly, and near instantaneously. The log files showed the nodes became immediately aware of corosync network failures, and immediately failed over to the secondary. Much faster than the HA behavior which seems to wait and stall. When restoring the network connectivity to the primary network, the cluster was immediately aware and switched back in less than 1.0 seconds.

EDIT:
I did find that the Proxmox documentation DOES mention the name of the parameter by saying

Links are used according to a priority setting. You can configure this priority by setting knet_link_priority in the corresponding interface section in corosync.conf, or, preferably, using the priority parameter when creating your cluster with pvecm:

However, I think those directions are too vague for an admin to use with confidence. If this article helped you, please leave a comment. Maybe this will show up in more internet searches and be more clear for other users.

b.miller · Jan 26, 2022

thanks for taking the time to write this out. now I have to figure out how to do the same thing with my ceph-private network failover

p1new · Aug 17, 2023

thanks for the write-up. this is exactly what I need with my switch recently crashing (yeah, time to replace the switch too but single switch will always have this issue).

i am going to possibly try and wire the two proxmox servers directly to each other and then have the qdevice on only the 2nd network (not sure if this will work at all but worth trying ... haven't tried to read the intricacies of the corosync setup yet).

Tau · Jan 18, 2024

I was checking for a solution similar to the problem you have, where the outage of one switch would make all nodes lose quorum. Using a secondary physical link and configuring this in corosync solves this problem. Thanks for writing this out <3

Search

Search

[TUTORIAL] Adding a secondary corosync network for failover - setting the priority when adding a 2nd corosync network to upgrade to a redundant ring network.

hackinthebox

New Member

b.miller

Member

p1new

Member

Tau

Member

We value your privacy