I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!Are you running corosync across those nested bonds? We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces. Also make sure you have redundant corosync networks, so you always have a fallback network if the primary one fails. The failover corosync provides is usually much faster than what a bond can provide.
Generally I'd recommend against nesting bonds like that. LACP already provides redundancy for you in case of a failure. If you really want to bond the 4 ports together why not bond all 4 of them via LACP? Do they have different bandwidth? If you want to hedge against a NIC with multiple ports failing, you can always bond ports of different NICs together via LACP.
This setup is needlessly complex imo, and could lead to some weird issues in the future.
Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!
So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.
LACP is for bonding multiple ports on the same switch. If you want to bond across multiple switches you will need to use MLAG in addition.
1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?
+1 This.1G connection is usually fine for Corosync, it needs good latency not bandwidth.
1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.
Thank you both. So I have a rbd ceph storage connection. What does proxmox use to connect to that? Based on what you guys have told me what I want to do is make the following:+1 This.
I had laying around old, cheap 8-port 100M switches. They are still good enough as corosync failover.
That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:This would have corosync on its own separate network and i'd have it use an active-backup 1G connection on 2 different switches.
What does proxmox use to connect to that?
This RBD connection is to a separate CEPH storage pool not managed by proxmox.
If that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:
View attachment 82019
That depends on how you setup your bridges, it doesn't matter if you use vmbr0 oder vmbr5 for that. But the bridges have to be 1:1 the same on every node.
And you usually want to configure your fastest connection for that.
No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slaveIf that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.
That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_networkI don't know how you can specify the network
For corosync I’m saying separate bridges so it fails over using corosync instead of using an active backup bond.No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slave
That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_network
"Client"-function is public_network.
Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.For corosync I’m saying separatebridgesso it fails over using corosync instead of using an active backup bond.
Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).The nodes in proxmox do not have storage and they connect to a separate storage that is using ceph itself
That's what ill do ill add more networks and separate it. So should i have a network for:Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.
Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).
If you only have this 192.168.50.0/19 you should use more networks and split it or use vlans.
Yes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.Is this enough?
Thank you for explaining! So how do i achieve a better latency and a better bandwith? Is better bandwith just a faster switch? If it is i don't know what makes something have good latency besides being all on the same switchYes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.
For all these networks you can configure a bond how you like it, but only corosync-traffic is the exception.
Corosync only needs good latency, less bandwidth.
Proxmox UI needs less bandwidth
RBD wants bandwidth and good latency
VM traffic wants bandwidth and good latency
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100GIs better bandwith just a faster switch?
Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.better latency
Thank you! I'll go and start this now. Thank you so so much for everything! I'll update this after i work on thisYes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/
Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.
Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!
everything worked, gave corosync a separate switch. Thank you for the helpYes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/
Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.
Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!
We use essential cookies to make this site work, and optional cookies to enhance your experience.