[SOLVED] Is this proxmox network configuration bad?

LunarMagic · Feb 3, 2025

Hello,

I have this network configuration and i keep having Quorate issues randomly. I have each LACP connection on separate switches like it shows in the comments. Is it better to just not have backup network connections to keep things simple?

shanreich · Feb 4, 2025

Are you running corosync across those nested bonds? We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces. Also make sure you have redundant corosync networks, so you always have a fallback network if the primary one fails. The failover corosync provides is usually much faster than what a bond can provide.

Generally I'd recommend against nesting bonds like that. LACP already provides redundancy for you in case of a failure. If you really want to bond the 4 ports together why not bond all 4 of them via LACP? Do they have different bandwidth? If you want to hedge against a NIC with multiple ports failing, you can always bond ports of different NICs together via LACP.

This setup is needlessly complex imo, and could lead to some weird issues in the future.

LunarMagic · Feb 4, 2025

shanreich said:
Are you running corosync across those nested bonds? We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces. Also make sure you have redundant corosync networks, so you always have a fallback network if the primary one fails. The failover corosync provides is usually much faster than what a bond can provide.

Generally I'd recommend against nesting bonds like that. LACP already provides redundancy for you in case of a failure. If you really want to bond the 4 ports together why not bond all 4 of them via LACP? Do they have different bandwidth? If you want to hedge against a NIC with multiple ports failing, you can always bond ports of different NICs together via LACP.

This setup is needlessly complex imo, and could lead to some weird issues in the future.

I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!

So i didn't do 4 via LACP because my unifi device can only bind 2 ports together. If i do an LACP connection in Proxmox can i do it across 2 switches if thats the case or can LACP connections only be on 1 switch?

My whole goal was to have redundant connections for proxmox so if switches went down i was good and i heard LACP connections really helped improve speed

shanreich · Feb 5, 2025

LunarMagic said:
I believe its running on bond0 and bond1. I haven't touched it in a while but how would you go about changing what corosync uses? I didn't realize a bonded connection for corosync caused issues so thank you for that!!

Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.

LACP is for bonding multiple ports on the same switch. If you want to bond across multiple switches you will need to use MLAG in addition.

LunarMagic · Feb 5, 2025

shanreich said:
Usually you add a additional link to corosync which represents the new network configuration, then you configure the new link on all nodes and verify that it is working. Only after verifying this, you remove the old link and corosync should switch over to the other link you configured.

LACP is for bonding multiple ports on the same switch. If you want to bond across multiple switches you will need to use MLAG in addition.

So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?

I feel like i have all of these network ports and switches for nothing if i can't make the speed even faster

shanreich · Feb 6, 2025

LunarMagic said:
So would you advise a single 10 gig connection for corosync and then a backup 10 gig connection on a separate switch? Unifi doesn't have MLAG so i guess ill stick to LACP connections but what would i even use LACP connections for if corosync doesn't like bonded connections?

1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.

mr44er · Feb 6, 2025

shanreich said:
1G connection is usually fine for Corosync, it needs good latency not bandwidth.

+1 This.
I had laying around old, cheap 8-port 100M switches. They are still good enough as corosync failover.

LunarMagic · Feb 6, 2025

shanreich said:
1G connection is usually fine for Corosync, it needs good latency not bandwidth. If you have that many ports and switches it might make sense to use that to split your networks, so you have your own storage / corosync / VM traffic network and spikes in one network don't affect another.

mr44er said:
+1 This.
I had laying around old, cheap 8-port 100M switches. They are still good enough as corosync failover.

Thank you both. So I have a rbd ceph storage connection. What does proxmox use to connect to that? Based on what you guys have told me what I want to do is make the following:

vmbr0 - Proxmox GUI Connection / RBD storage connection (since connection to the proxmox gui would barely use anything. If this is not the case let me know)

vmbr1 - VM internet

vmbr2 - Corosync

This would have corosync on its own separate network and i'd have it use an active-backup 1G connection on 2 different switches.

VM Internet would use LDAP connection with a failover onto a different switch. Both would be 1G connections since they are only connecting outside to the internet.

The Proxmox GUI/ RBD connection would use an LDAP connection with failover to another switch (both 10G) I know with the VM internet I can just change the network adapter they use on the vm of the machine.

For corosync I believe you edit /etc/pve/corosync.conf and then replace the IP address of the ring with the adapter you want to you.

(Please correct me if i'm wrong with the info above)

I'm not sure how to change the connection used for me to connect to the GUI and whats used for the RBD connection.

This RBD connection is to a separate CEPH storage pool not managed by proxmox.

mr44er · Feb 6, 2025

LunarMagic said:
This would have corosync on its own separate network and i'd have it use an active-backup 1G connection on 2 different switches.

That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:

LunarMagic said:
What does proxmox use to connect to that?

LunarMagic said:
This RBD connection is to a separate CEPH storage pool not managed by proxmox.

That depends on how you setup your bridges, it doesn't matter if you use vmbr0 oder vmbr5 for that. But the bridges have to be 1:1 the same on every node.
And you usually want to configure your fastest connection for that.

LunarMagic · Feb 6, 2025

mr44er said:
That works, but not ideal. Corosync itself has a failover function...ring0, ring1....until ring8 in that order, direct connection no bond etc. I don't know if it is changeable afterwards, but usually you set that before creating a cluster. See the screenshot:
View attachment 82019

That depends on how you setup your bridges, it doesn't matter if you use vmbr0 oder vmbr5 for that. But the bridges have to be 1:1 the same on every node.
And you usually want to configure your fastest connection for that.

If that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.

I just added the RBD connection in via the storage section of the datacenter. I don't know how you can specify the network

mr44er · Feb 6, 2025

LunarMagic said:
If that's the case I could just do ring 0 and then ring 1 and just have separate connections instead of a failover.

No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slave

LunarMagic said:
I don't know how you can specify the network

That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_network
"Client"-function is public_network.

LunarMagic · Feb 6, 2025

mr44er said:
No, corosync uses ring0 as primary. If ring0 is dead, it tries ring1. As said, corosync has its own failover (ring0-8 if defined, in that order). ring0=master, ring1-8=slave

That depends on the ceph-config on the nodes, where these networks to the corresponding IPs belong. Click on a node->ceph->configuration->[global]-> cluster_network and public_network
"Client"-function is public_network.

For corosync I’m saying separate bridges so it fails over using corosync instead of using an active backup bond.

RBD didn’t require me to install ceph. Ceph is not installed on proxmox. The nodes in proxmox do not have storage and they connect to a separate storage that is using ceph itself

mr44er · Feb 6, 2025

LunarMagic said:
For corosync I’m saying separate ~~bridges~~ so it fails over using corosync instead of using an active backup bond.

Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.

LunarMagic said:
The nodes in proxmox do not have storage and they connect to a separate storage that is using ceph itself

Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).
If you only have this 192.168.50.0/19 you should use more networks and split it or use vlans.

LunarMagic · Feb 6, 2025

Awesome thank you! I'll try that.

mr44er said:
Correct. You don't need a bridge in that case. You set the IP on the NICport directly. In my screenshot under the black bars are my IP-adresses for that.

Which IP does this storage have? -> This answers where you set the corresponding network (and the bridge).
If you only have this 192.168.50.0/19 you should use more networks and split it or use vlans.

That's what ill do ill add more networks and separate it. So should i have a network for:

Proxmox UI/RBD Traffic

Corosync

Vm Traffic

Is this enough?

mr44er · Feb 7, 2025

LunarMagic said:
Is this enough?

Yes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.

For all these networks you can configure a bond how you like it, but only corosync-traffic is the exception.
Corosync only needs good latency, less bandwidth.
Proxmox UI needs less bandwidth
RBD wants bandwidth and good latency
VM traffic wants bandwidth and good latency

LunarMagic · Feb 7, 2025

mr44er said:
Yes, looks good. Additionally I would move the proxmox-"itself"-IP into another network or vlan, but that's not a must. I like to separate and isolate stuff.

For all these networks you can configure a bond how you like it, but only corosync-traffic is the exception.
Corosync only needs good latency, less bandwidth.
Proxmox UI needs less bandwidth
RBD wants bandwidth and good latency
VM traffic wants bandwidth and good latency

Thank you for explaining! So how do i achieve a better latency and a better bandwith? Is better bandwith just a faster switch? If it is i don't know what makes something have good latency besides being all on the same switch

mr44er · Feb 7, 2025

LunarMagic said:
Is better bandwith just a faster switch?

Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/

LunarMagic said:
better latency

Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!

LunarMagic · Feb 8, 2025

mr44er said:
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/

Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!

Thank you! I'll go and start this now. Thank you so so much for everything! I'll update this after i work on this

LunarMagic · Apr 28, 2025

mr44er said:
Yes, from slow to fast. 10M -> 100M -> 1G -> 10G -> 40G -> 100G
https://www.howtogeek.com/latency-vs-bandwidth-vs-throughput-what-is-the-difference/

Spread and separate everything as good as you can with your NIClinks. Bad example: only one physical 1G/1000M connection and you setup everything onto this. Corosync, Proxmox itself, RBD, VM traffic. 1000M/4 = 250M left per one network, they will fight against, latency goes up.
If you run corosync alone on one NIClink, no fight happens, latency always excellent.

Thing is...only for corosync it is important, that it has low latency so your cluster can act happy. <- that is a must!
For the other ones it is not so important and it would just work slower, but in general you aim for good bandwidth and low latency everywhere. <- not a must!

everything worked, gave corosync a separate switch. Thank you for the help

[SOLVED] Is this proxmox network configuration bad?

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Member

We value your privacy