Which network should use for Corosync?

Dec 28, 2019
32
2
8
31
Hi,

Currently, our Proxmox corosync in on public IP. Is this a best practice? A few articles mentioning about
'RING' while discussing corosync.


# corosync-cfgtool -s
Printing link status.
Local node ID 5
LINK ID 0
addr = 117.xxx.x.x
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:1 link connected:1
nodeid 4: link enabled:1 link connected:1
nodeid 5: link enabled:1 link connected:1
nodeid 6: link enabled:1 link connected:1

Below is a result I got from an online article (which specifying RING)


[root@pcmk-1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.122.101
status = ring 0 active with no faults


Is our configuration seems risky? Why the command ran on our proxmox installation not showing RING ID? In fact what the term RING really mean here?

Thanks in advance
 
Best practice is to have Corosync on a dedicated physical network that is just used for Corosync.
A 1GBit network is more than fast enough. Corosync doesn't need a lot of bandwidth but it really needs low latency.

If you have Corosync running on a network with other traffic, especially anything storage-related like NFS, Ceph, Backup, iSCSI,... you can easily run into the situation that the other traffic is congesting the network. This, in turn, increases the latency for the corosync packets and in a worst-case scenario the cluster will "fall apart" until the corosync services on each node can reach the others in a timely manner again.
Should you have any HA guests active on your nodes you will have the problem that they will fence themselves after 2 minutes without being part of the quorate cluster. If the whole cluster fell apart, this means that each node with HA guests on it will fence itself.

Additional rings, or links (it's the same in the context of corosync) increase the redundancy. Corosync will switch to another link if the main one cannot be used anymore. Corosync 3 (used since PVE 6.x) supports up to 8 links.
 
  • Like
Reactions: Moayad
Hi,

all corosync traffic is encrypted with an authkey only known by the cluster members (it is exchanged on join), so from a security stand point it doesn't really matter where it runs. Albeit public networks could be DDOS'd, so from a reliability and availability stand point it can be better to run it on a private network/LAN.

"Ring" or nowadays often also called "link" are a way of corosync to use more than one network for communicating, this allows to fallback to another if one network fails.

In general the most important thing is that the network on which corosync runs isn't used by IO traffic, as this can disrupt corosync easily. While corosync isn't using much bandwidth it really is sensitive to latency (spikes)

See also: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network (and the rest of that chapter)
 
  • Like
Reactions: Moayad
  • Like
Reactions: adrian.jiang
Dear Mr. Lamprecht,

I found this thread due to my researches to the following problem/question:

I would like to build a Proxmox Cluster on basis of dedicated root servers from Hetzner. The nodes can be connected by Hetzners vSwitch-technology (full virtual Layer-2 connections). Due to the fact, that vSwitch-traffic does not get encrypted by default, I would like to know, if ALL cluster traffic is already secured by encryption of Proxmox itself?
As you mentioned above, corosync traffic is secured by encryption by default. So is it „safe“ to run the cluster network of Proxmox without any additional encryption (over an unsecured network)?
Unfortunately I can find no further information to this subject (cluster network encryption).

Best regards,
mscd
 
all corosync traffic is encrypted with an authkey only known by the cluster members (it is exchanged on join), so from a security stand point it doesn't really matter where it runs.
Hey Thomas,

Similar question to MSCD:

We're looking at the future viability of using DarkFibre to connect an 8 node cluster across two cities (~378KM). We have calculated that the latency should be low enough and we will have authority over the fibre link with the exception that some points will be switched and relayed by the ISP. We want to know if there is written documentation about the Corosync Traffic being encrypted- nothing in the corosync documentation specifies that it is encrypted or how it is encrypted: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

Thanks, let me know if I missed something, if not could someone on the team please update the documentation?

Tmanok
 
Hey Thomas,

Similar question to MSCD:

We're looking at the future viability of using DarkFibre to connect an 8 node cluster across two cities (~378KM). We have calculated that the latency should be low enough and we will have authority over the fibre link with the exception that some points will be switched and relayed by the ISP. We want to know if there is written documentation about the Corosync Traffic being encrypted- nothing in the corosync documentation specifies that it is encrypted or how it is encrypted: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

Thanks, let me know if I missed something, if not could someone on the team please update the documentation?

Tmanok
you should be careful if you want to stretch 1 cluster between 2cities .Gf you have a fibercut between the 2 cities, you'll have quorum lost in 1 of the city.
you need a quorum nodesomewhere in a third city, with a different network link between each city.


Not sure with 380km latency will be ok. (I think it shoud be around 5-6ms ?). Seem a bit high.


In my opinion, you should build 2 differents cluster on each city.
 
Not sure with 380km latency will be ok. (I think it shoud be around 5-6ms ?). Seem a bit high.
Hi Spirit,

Having a witness node in another DC is a good idea. That being said, the latency on the fibre will be 3.8ms (1ms per 100KM) given it would be clear of all other traffic. The recommendation in the wiki is around 5ms and below 10ms so if we can make it work then I will be happy.

We are looking at 2x fibre routes so that if one is cut the other will be in a completely different location. The latency may vary a bit more or less like 3ms to 5ms depending on the route. We want instant failover, so we are looking hard at the fibre options. The key now is security.

Thanks, our current plan is 4x clusters with four nodes each, two clusters at each site but extra-cluster replication does not exist as a feature yet and we want to meet SLAs.


Tmanok
 
Hi Spirit,

Having a witness node in another DC is a good idea. That being said, the latency on the fibre will be 3.8ms (1ms per 100KM) given it would be clear of all other traffic. The recommendation in the wiki is around 5ms and below 10ms so if we can make it work then I will be happy.

We are looking at 2x fibre routes so that if one is cut the other will be in a completely different location. The latency may vary a bit more or less like 3ms to 5ms depending on the route. We want instant failover, so we are looking hard at the fibre options. The key now is security.

Thanks, our current plan is 4x clusters with four nodes each, two clusters at each site but extra-cluster replication does not exist as a feature yet and we want to meet SLAs.


Tmanok
I have a customer running a cluster, with 3 nodes on 2 different DC with 1ms and an extra 7th node for quorum at 10ms. It's working. I'm not sure with bigger cluster.

Note that for storage replication, if you want really instant failover, you'll need syncronous replication, so don't expect a lot of iops with 4ms if you're application don't do too much parallelism or sequential writes (database journal,....) (1ms = 1000iops , 4ms = 250iops).
 
I have a customer running a cluster, with 3 nodes on 2 different DC with 1ms and an extra 7th node for quorum at 10ms. It's working. I'm not sure with bigger cluster.

Note that for storage replication, if you want really instant failover, you'll need syncronous replication, so don't expect a lot of iops with 4ms if you're application don't do too much parallelism or sequential writes (database journal,....) (1ms = 1000iops , 4ms = 250iops).
The replication (PVE based asynchronous) will take place over VPLS links separate from the leased fibre cluster links but those are very good estimates for storage iops per latency. Unfortunately, our VPLS links are currently 8-13ms (62.5 iops by your estimate) we are planning to reduce core network latency to improve this but you've made a good point about how cautious we need to be about replication. Thank you for those insights, I honestly overlooked iops per ms of ping. With say 20 to 40 core VMs replicating we will want frequent replication- I don't have a good estimate right now for how many IOPS that would take. Note DBs will replicate separately at the application layer. I would be interested to hear what you use for synchronous replication (PVE or CEPH or SAN replication).


Back to the main point, Thomas please have someone update those docs or let me know whether corosync encryption it is written down in a technical document.

Thank you,


Tmanok
 
FYI, for even-numbered node counts you could add a QDevice, which doesn't participates as full cluster node, but only as vote arbiter and thus doesn't has the same latency requirements.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support

Back to the main point, Thomas please have someone update those docs or let me know whether corosync encryption it is written down in a technical document.
Well what do you need to know exactly?

Basically we set the corosync.conf secauth option to on, which implies AES 256 encryption:
https://manpages.debian.org/bullseye/corosync/corosync.conf.5.en.html#secauth
 
  • Like
Reactions: Tmanok

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!