SCROLL TO THE END FOR AN EXAMPLE
ADD
Here is the guide on creating a separate cluster network:
https://pve.proxmox.com/wiki/Separate_Cluster_Network
MY OBJECTIVE
These are some of my experiences migrating from my first test cluster to a meaner, cleaner, much more reliable one. Among many things, the 3 primary upgrades I made were A) going from a 2-node to a 3-node cluster for a proper quorum for high availability, B) local storage on separate physical volumes to support ZFS-zsync for "mission critical" guests that can't depend on network storage, and C) a separate corosync network as recommended in the documentation.
MY HARDWARE
My nodes are just old Dell Optiplex 7010 SFF's. They each take 2xSSD's, a 10-gig network PCIE card (in addition to the onboard 1Gbe interface), with an additional slot to spare for either a low-profile GPU, additional network interfaces, or additional solid state storage. I'm using the onboard 1Gbe interface (eno1) for the corosync network
MY INITIAL CONFIG
Cluster name:
10gig interface IP's & hostnames
1Gbe corosync IP's & hostnames
JOINING MACHINES TO MY CLUSTER
To do this, I logged in to each node via SSH and entered the following commands:
(from 192.168.4.111)
(from 192.168.4.112)
(from 192.168.4.113)
STATUS CHECK
This resulted in a 3-node cluster with a separate corosync network. Of course, there was additional node network configuration necessary first, but that goes beyond the scope of this post. Feel free to ping me and I'll help any newbies out. Running the following command produced this output:
UPDATING HOSTS FILE WITH UNIQUE HOSTNAMES FOR EACH STATIC IP INTERFACE
I updated /etc/hosts with the following IPv4 mappings for good measure:
This is likely unnecessary as PVE stores IP addresses in config files instead of hostnames for stability and performance and won't often need to resolve them anyway. This is more of the admin giving his creations names.
FORCE-GENERATE CERTIFICATES THAT ARE RANDOMIZED WITH ENTROPY
And then of course, run
I'm extra cautious and like a fresh pair of certificates for each node after any modification to cluster membership.
RESTRICT MIGRATION ON THE SYNC NETWORK
All of this config for a separate corosync network could be pointless if Proxmox decides to clog up your sync network with guest migrations between nodes!! It's important to tell the cluster which network it's supposed to use for these tasks. To tune this behavior, I edit the
WHY IS THIS CONFIGURATION WRONG?
I was happy with this configuration for a few weeks, but I was generally anxious about the stability of the separate corosync network. Adding additional critical components to a system creates additional points of potential failure.
And that's exactly what happened.
While configuring a firewall rule on my network to allow Wake-on-lan packets to pass into the
FRAGILE CONFIGURATION
The moment the switch used for corosync was powered off, I instantly saw all 3 nodes in the web console change to nasty icons as every guest, container, and node depended on it. I cringed and my heart sunk.
I already spent my budget for upgrades, and I'm using the spare PCIe slot for GPU's. Adding network ports on each node for a redundant ring network for corosync just isn't in my future. I thought my only choice was to accept it as it is. But I found there's a great solution. I've been reading the docs intensely for several weeks in preparation for the migration, and I don't know I missed this:
EUREKA! SURPRISED THERE ISN'T MORE EMPHASIS ON THIS!
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy
Bingo! Update
Or not.
GOOD LUCK FIGURING OUT HOW, THOUGH
For this to work, each link should be assigned a priority. The higher the value, the higher the priority. The documentation explains how to initiate this very well, but no where in the proxmox documentation is information on how to specify priority values to an existing configuration where none exist! When creating a single sync network, priorities are irrelevant. The generated configuration files simply do not show this parameter. NOTE: In my particular situation, the lower ring id (ring0) should get the first priority by default, but I don't want to rely on a default setting that doesn't even show up in the config for something pretty important.
I'm citing additional documentation here about corosync from debian.org. This was a very long search for me.
https://manpages.debian.org/bullseye/corosync/corosync.conf.5.en.html
THE SOLUTION
You need to add
to your
FULL CONFIGURATION EXAMPLE
The values you assign to the parameter is arbitrary, and only relative to other assigned values. (Don't be cute and try a floating point number though). I used 255 and 4 as my values simply because they are consistent with the 3rd octet of the network addresses they represent. If you wanted to swap the primary corosync ring with the secondary, simply swap the values for
This method tested and worked perfectly, and near instantaneously. The log files showed the nodes became immediately aware of corosync network failures, and immediately failed over to the secondary. Much faster than the HA behavior which seems to wait and stall. When restoring the network connectivity to the primary network, the cluster was immediately aware and switched back in less than 1.0 seconds.
EDIT:
I did find that the Proxmox documentation DOES mention the name of the parameter by saying
ADD
knet_link_priority: <value>
TO YOUR /etc/pve/corosync.conf
FILE UNDER THE TOTEM DIRECTIVE AND EACH RESPECTIVE INTERFACE SUBDIRECTIVEHere is the guide on creating a separate cluster network:
https://pve.proxmox.com/wiki/Separate_Cluster_Network
MY OBJECTIVE
These are some of my experiences migrating from my first test cluster to a meaner, cleaner, much more reliable one. Among many things, the 3 primary upgrades I made were A) going from a 2-node to a 3-node cluster for a proper quorum for high availability, B) local storage on separate physical volumes to support ZFS-zsync for "mission critical" guests that can't depend on network storage, and C) a separate corosync network as recommended in the documentation.
MY HARDWARE
My nodes are just old Dell Optiplex 7010 SFF's. They each take 2xSSD's, a 10-gig network PCIE card (in addition to the onboard 1Gbe interface), with an additional slot to spare for either a low-profile GPU, additional network interfaces, or additional solid state storage. I'm using the onboard 1Gbe interface (eno1) for the corosync network
192.168.255.0/24
and using 3 ports on a managed switch as an untagged VLAN to connect everything. I'm happy with this because it doesn't need to get out to the internet, however, I went ahead and made sure I completed the VLAN config on my switches and router so I'd have out-of-band SSH access, can send Wake-on-lan packets if necessary, and still have access for monitoring. The 10gig interface on each node connects to my primary LAN with access to shared storage.MY INITIAL CONFIG
Cluster name:
pve-cluster-a
10gig interface IP's & hostnames
Code:
192.168.4.111/24 pve-a
192.168.4.112/24 pve-b
192.168.4.113/24 pve-c
1Gbe corosync IP's & hostnames
Code:
192.168.255.11/24 pve-sync-a
192.168.255.12/24 pve-sync-b
192.168.255.13/24 pve-sync-c
JOINING MACHINES TO MY CLUSTER
To do this, I logged in to each node via SSH and entered the following commands:
(from 192.168.4.111)
Code:
pvecm pve-cluster-a --link0 192.168.255.11 --nodeid 1
Code:
pvecm add 192.168.4.111 --link0 192.168.255.12 --nodeid 2
Code:
pvecm add 192.168.4.111 --link0 192.168.255.13 --nodeid 3
STATUS CHECK
This resulted in a 3-node cluster with a separate corosync network. Of course, there was additional node network configuration necessary first, but that goes beyond the scope of this post. Feel free to ping me and I'll help any newbies out. Running the following command produced this output:
Code:
root@pve-a:/etc/pve# pvecm status
Cluster information
-------------------
Name: pve-cluster-a
Config Version: 4
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sun Nov 14 09:55:16 2021
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.167
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.255.11 (local)
0x00000002 1 192.168.255.12
0x00000003 1 192.168.255.13
UPDATING HOSTS FILE WITH UNIQUE HOSTNAMES FOR EACH STATIC IP INTERFACE
I updated /etc/hosts with the following IPv4 mappings for good measure:
This is likely unnecessary as PVE stores IP addresses in config files instead of hostnames for stability and performance and won't often need to resolve them anyway. This is more of the admin giving his creations names.
Code:
127.0.0.1 localhost.localdomain localhost
192.168.4.111 pve-a.mydomain.com pve-a
192.168.4.112 pve-b.mydomain.com pve-b
192.168.4.113 pve-c.mydomain.com pve-c
192.168.255.11 pve-sync-a.mydomain.com pve-sync-a
192.168.255.12 pve-sync-b.mydomain.com pve-sync-b
192.168.255.13 pve-sync-c.mydomain.com pve-sync-c
FORCE-GENERATE CERTIFICATES THAT ARE RANDOMIZED WITH ENTROPY
And then of course, run
Code:
pvecm updatecerts --force
RESTRICT MIGRATION ON THE SYNC NETWORK
All of this config for a separate corosync network could be pointless if Proxmox decides to clog up your sync network with guest migrations between nodes!! It's important to tell the cluster which network it's supposed to use for these tasks. To tune this behavior, I edit the
/etc/pve/datacenter.cfg
file and add this line:migration: secure,network=192.168.4.0/24
This ensures the sync network isn't saturated, and that the cluster takes full advantage of the 10gig network.WHY IS THIS CONFIGURATION WRONG?
I was happy with this configuration for a few weeks, but I was generally anxious about the stability of the separate corosync network. Adding additional critical components to a system creates additional points of potential failure.
And that's exactly what happened.
While configuring a firewall rule on my network to allow Wake-on-lan packets to pass into the
192.168.255.0/24
network, I was having trouble determining exactly why my firewall was still blocking packets, even though I had added a rule exception directly from the logs. I wanted to eliminate the VLAN configuration on the switch as the possible problem, and while I was in there, (like a dummy), I mistakenly disabled the management VLAN, locking me out of the switch altogether until I did a reboot to revert the unsaved changes. Even SSH and Telnet were inaccessible.FRAGILE CONFIGURATION
The moment the switch used for corosync was powered off, I instantly saw all 3 nodes in the web console change to nasty icons as every guest, container, and node depended on it. I cringed and my heart sunk.
I already spent my budget for upgrades, and I'm using the spare PCIe slot for GPU's. Adding network ports on each node for a redundant ring network for corosync just isn't in my future. I thought my only choice was to accept it as it is. But I found there's a great solution. I've been reading the docs intensely for several weeks in preparation for the migration, and I don't know I missed this:
EUREKA! SURPRISED THERE ISN'T MORE EMPHASIS ON THIS!
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy
Since lower priority links will not see traffic unless all higher priorities have failed, it becomes a useful strategy to specify even networks used for other tasks (VMs, storage, etc…) as low-priority links. If worst comes to worst, a higher-latency or more congested connection might be better than no connection at all.
Bingo! Update
/etc/pve/corosync.conf
and add a 2nd link to the system! Problem solved! If the 1Gbe switch ever goes down, the corosync network can failover to the existing 10gig network on a different switch!Or not.
GOOD LUCK FIGURING OUT HOW, THOUGH
For this to work, each link should be assigned a priority. The higher the value, the higher the priority. The documentation explains how to initiate this very well, but no where in the proxmox documentation is information on how to specify priority values to an existing configuration where none exist! When creating a single sync network, priorities are irrelevant. The generated configuration files simply do not show this parameter. NOTE: In my particular situation, the lower ring id (ring0) should get the first priority by default, but I don't want to rely on a default setting that doesn't even show up in the config for something pretty important.
I'm citing additional documentation here about corosync from debian.org. This was a very long search for me.
https://manpages.debian.org/bullseye/corosync/corosync.conf.5.en.html
THE SOLUTION
You need to add
Code:
knet_link_priority: <value>
to your
/etc/pve/corosync.conf
file under the TOTEM > INTERFACE parameters and assign your priority to each interface. Here's my finished .conf file as an example:FULL CONFIGURATION EXAMPLE
Code:
root@pve-a:/etc/pve# cat corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve-a
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.255.11
ring1_addr: 192.168.4.111
}
node {
name: pve-b
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.255.12
ring1_addr: 192.168.4.112
}
node {
name: pve-c
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.255.13
ring1_addr: 192.168.4.113
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: pve-cluster-a
config_version: 4
interface {
linknumber: 0
knet_link_priority: 255
}
interface {
linknumber: 1
knet_link_priority: 4
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
The values you assign to the parameter is arbitrary, and only relative to other assigned values. (Don't be cute and try a floating point number though). I used 255 and 4 as my values simply because they are consistent with the 3rd octet of the network addresses they represent. If you wanted to swap the primary corosync ring with the secondary, simply swap the values for
knet_link_priority:
Any time you modify this conf file, remember to increment the value for the config_version:
as well, per documentation.This method tested and worked perfectly, and near instantaneously. The log files showed the nodes became immediately aware of corosync network failures, and immediately failed over to the secondary. Much faster than the HA behavior which seems to wait and stall. When restoring the network connectivity to the primary network, the cluster was immediately aware and switched back in less than 1.0 seconds.
EDIT:
I did find that the Proxmox documentation DOES mention the name of the parameter by saying
However, I think those directions are too vague for an admin to use with confidence. If this article helped you, please leave a comment. Maybe this will show up in more internet searches and be more clear for other users.Links are used according to a priority setting. You can configure this priority by setting knet_link_priority in the corresponding interface section in corosync.conf, or, preferably, using the priority parameter when creating your cluster with pvecm: