[SOLVED] Is this proxmox network configuration bad?

LunarMagic

Member
Mar 14, 2024
57
6
8
Hello,

I have this network configuration and i keep having Quorate issues randomly. I have each LACP connection on separate switches like it shows in the comments. Is it better to just not have backup network connections to keep things simple?

1738552059747.png
 
Are you running corosync across those nested bonds? We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces. Also make sure you have redundant corosync networks, so you always have a fallback network if the primary one fails. The failover corosync provides is usually much faster than what a bond can provide.

Generally I'd recommend against nesting bonds like that. LACP already provides redundancy for you in case of a failure. If you really want to bond the 4 ports together why not bond all 4 of them via LACP? Do they have different bandwidth? If you want to hedge against a NIC with multiple ports failing, you can always bond ports of different NICs together via LACP.

This setup is needlessly complex imo, and could lead to some weird issues in the future.
 
  • Like
Reactions: dj423
Are you running corosync across those nested bonds? We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces. Also make sure you have redundant corosync networks, so you always have a fallback network if the primary one fails. The failover corosync provides is usually much faster than what a bond can provide.

Generally I'd recommend against nesting bonds like that. LACP already provides redundancy for you in case of a failure. If you really want to bond the 4 ports together why not bond all 4 of them via LACP? Do they have different bandwidth? If you want to hedge against a NIC with multiple ports failing, you can always bond ports of different NICs together via LACP.

This setup is needlessly complex imo, and could lead to some weird issues in the future.
I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!

So i didn't do 4 via LACP because my unifi device can only bind 2 ports together. If i do an LACP connection in Proxmox can i do it across 2 switches if thats the case or can LACP connections only be on 1 switch?

My whole goal was to have redundant connections for proxmox so if switches went down i was good and i heard LACP connections really helped improve speed
 
I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!
Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.

LACP is for bonding multiple ports on the same switch. If you want to bond across multiple switches you will need to use MLAG in addition.
 
Last edited:
Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.

LACP is for bonding multiple ports on the same switch. If you want to bond across multiple switches you will need to use MLAG in addition.
So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?

I feel like i have all of these network ports and switches for nothing if i can't make the speed even faster
 
So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?
1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.
 
  • Like
Reactions: dj423
1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.
+1 This.
I had laying around old, cheap 8-port 100M switches. They are still good enough as corosync failover.
Thank you both. So I have a rbd ceph storage connection. What does proxmox use to connect to that? Based on what you guys have told me what I want to do is make the following:

vmbr0 - Proxmox GUI Connection / RBD storage connection (since connection to the proxmox gui would barely use anything. If this is not the case let me know)

vmbr1 - VM internet

vmbr2 - Corosync

This would have corosync on its own separate network and i'd have it use an active-backup 1G connection on 2 different switches.

VM Internet would use LDAP connection with a failover onto a different switch. Both would be 1G connections since they are only connecting outside to the internet.

The Proxmox GUI/ RBD connection would use an LDAP connection with failover to another switch (both 10G) I know with the VM internet I can just change the network adapter they use on the vm of the machine.

For corosync I believe you edit /etc/pve/corosync.conf and then replace the IP address of the ring with the adapter you want to you.

(Please correct me if i'm wrong with the info above)

I'm not sure how to change the connection used for me to connect to the GUI and whats used for the RBD connection.

This RBD connection is to a separate CEPH storage pool not managed by proxmox.


Screenshot 2025-02-06 at 6.38.47 AM.png
 
This would have corosync on its own separate network and i'd have it use an active-backup 1G connection on 2 different switches.
That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:
corosync.png

What does proxmox use to connect to that?
This RBD connection is to a separate CEPH storage pool not managed by proxmox.

That depends on how you setup your bridges, it doesn't matter if you use vmbr0 oder vmbr5 for that. But the bridges have to be 1:1 the same on every node.
And you usually want to configure your fastest connection for that.
 
That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:
View attachment 82019




That depends on how you setup your bridges, it doesn't matter if you use vmbr0 oder vmbr5 for that. But the bridges have to be 1:1 the same on every node.
And you usually want to configure your fastest connection for that.
If that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.

I just added the RBD connection in via the storage section of the datacenter. I don't know how you can specify the network
 
If that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.
No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slave

I don't know how you can specify the network
That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_network
"Client"-function is public_network.
 
No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slave


That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_network
"Client"-function is public_network.
For corosync I’m saying separate bridges so it fails over using corosync instead of using an active backup bond.

RBD didn’t require me to install ceph. Ceph is not installed on proxmox. The nodes in proxmox do not have storage and they connect to a separate storage that is using ceph itself
 
For corosync I’m saying separate bridges so it fails over using corosync instead of using an active backup bond.
Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.

The nodes in proxmox do not have storage and they connect to a separate storage that is using ceph itself
Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).
If you only have this 192.168.50.0/19 you should use more networks and split it or use vlans.
 
Last edited:
Awesome thank you! I'll try that.
Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.


Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).
If you only have this 192.168.50.0/19 you should use more networks and split it or use vlans.
That's what ill do ill add more networks and separate it. So should i have a network for:

Proxmox UI/RBD Traffic

Corosync

Vm Traffic

Is this enough?
 
Is this enough?
Yes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.

For all these networks you can configure a bond how you like it, but only corosync-traffic is the exception.
Corosync only needs good latency, less bandwidth.
Proxmox UI needs less bandwidth
RBD wants bandwidth and good latency
VM traffic wants bandwidth and good latency
 
Yes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.

For all these networks you can configure a bond how you like it, but only corosync-traffic is the exception.
Corosync only needs good latency, less bandwidth.
Proxmox UI needs less bandwidth
RBD wants bandwidth and good latency
VM traffic wants bandwidth and good latency
Thank you for explaining! So how do i achieve a better latency and a better bandwith? Is better bandwith just a faster switch? If it is i don't know what makes something have good latency besides being all on the same switch
 
Is better bandwith just a faster switch?
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/

better latency
Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!
 
Last edited:
  • Like
Reactions: LunarMagic
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/


Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!
Thank you! I'll go and start this now. Thank you so so much for everything! I'll update this after i work on this
 
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/


Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!
everything worked, gave corosync a separate switch. Thank you for the help :)
 
  • Like
Reactions: mr44er