Correct way to set MTU on a Corosync cluster network (pve 8.3)

wopner

New Member
Dec 3, 2024
2
2
3
Looking for some guidance on the how to correctly set the MTU for a cluster with a dedicated / separate network for Corosync traffic.

I have a (rather) simple setup of 3 pve nodes, each having 2 physical interfaces:
  • enp65s0: A mellanox fiber nic. Used by corosync in ring config (see below) for migration/internode communication
  • enp1s0: An onboard 10GBE nic. Used for the VMBR that all VM traffic goes into/out of. Working just fine, even with problem below.

My question is how to correctly set the MTU on that mellanox fiber card (enp65s0) to 9000? I have seen reference to a "netmtu" setting that was available in Corosync.conf, but was deprecated? The current documentation doesn't mention it, so I am guessing that is the case.

I have tested the jumbo frames using ip link set mtu 9000 on each node before going any further, ensured that it's set on the Mikrotik switch, etc. Everything works nicely between nodes, iperf is successful, as are pings, etc.

Edit: Updated Solution - After help from fabian, the problem was indeed the switch's MTU settings not being properly set.

If I set it on the webui's config screen as shown below, everything will work with the exception of the webui, which won't correctly communicate with other nodes in the cluster (only the one directly connected to). Loading any page or interacting with any other node will fail. The specific error from the WebUI is a 596, also shown below.

1736371348860.png

Example error when trying to access anything from another node aside from the one connected to:
1736373649319.png


I notice that:
  1. output of corosync-cfgtool -n is showing the default MTU after the above change, even after a full cluster reboot
  2. a CLI qm migrate on an offline test VM also fails, with an error 255. However it is choosing the correct/cluster network of 10.10.x.x, strange that it fails since the config tool reports the target node as connected and reachable.
  3. Additionally, ip link show is showing the expected MTU value for both the corosync network (9000) and the vmbr adapter (1500)
Here are my relevant configs, let me know if any logs or other information would be helpful.

Thank you in advance for pointing me in the right direction!
 
Last edited:
corosync has two levels of MTU:
- netmtu is the logical MTU for the payloads corosync transmits, you basically only need to set it if you want to clamp the MTU at that level
- knet (the network library used by corosync) will determine the effective MTU of each link using pMTUd, and derive a global MTU used at that (network) level. if corosync transmits payloads that are bigger than this global MTU knet will fragment them (just like the kernel in turn will fragment packets that are too big, although that should only happen in this setup when the MTU is reconfigured and corosync/knet haven't noticed yet ;)). it's possible to override the automatic MTU mechanism and provide an explicit manual value (knet_mtu), but that is not needed normally

could you check the following:
- corosync logs for pmtud messages ("journalctl --since today -u corosync")
- ping using big messages to determine whether the MTU really works ;) (start with "ping -M do -s 8000 10.10.0.2" and then increase the 8000 until you get an error)
 
the migration issue might be related or not.. can you manually try the SSH command? and then retry with the other IP address of that node?
 
I'm so sorry, this was caused by the L2 MTU settings on the Mikrotik 100G switch reverting after I set them.... your suggestion of a ping was what exposed it. "ping -M do -s 8000 10.10.0.2" caused immediate failure, 100% packet loss, where-as a normal ping happily succeeded.

Another symptom of this was the corosync-cfgtool -n showing an unexpected MTU value. Just in case people run across this in the future.

Thank you for the detailed information on knet vs. netmtu layers and how they handle MTU, I did not know that, and it helps me understand.

I appreciate your time, and apologize for wasting it! It speaks volumes in how you guys developed this stack that VM's & all the intra-node functions continued to work, even though the corosync links were actually failing. Very graceful, and appreciated.
 
  • Like
Reactions: dj423 and fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!