Challenges with MTU 9000 and CEPH Deployment on Proxmox Cluster Using LACP and TP-Link Switches

blucas

New Member
Apr 14, 2024
2
0
1
Hi,

I'm currently struggling with a networking issue that seems to affect the functionality of a CEPH cluster deployed across multiple nodes in my Proxmox environment, CEPH never worked since instalation giving error 500 time out on both nodes 2 and 3 (2ª and 3ª instalation). Despite correct configurations, I'm encountering problems specifically related to MTU settings and node communication which I suspect are hampering CEPH's performance on nodes 2 and 3.

When the MTU is normal 1500, everyhing works fine.

This is my network config in the nodes:
Code:
auto lo
iface lo inet loopback

auto eno49
iface eno49 inet manual
        mtu 9000

auto eno50
iface eno50 inet manual
        mtu 9000

iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno49 eno50
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2
        mtu 9000

auto vmbr0
iface vmbr0 inet static
        address 192.168.21.224/24
        gateway 192.168.21.254
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        mtu 9000

source /etc/network/interfaces.d/*

All switches are configured to support an MTU of 9000, and this setting is global. Despite this, when I perform ping tests with large packets, they fail, suggesting that the MTU 9000 is not being upheld somewhere in the network.

Interestingly, the setup works fine locally (in loopback), but fails when communicating across the network, particularly affecting the installation and operation of CEPH on nodes 2 and 3. This leads me to suspect this might be the reason why CEPH is not functioning correctly on these nodes.

I would also like to note that the HA cluster is functioning well, indicating that the issue is likely isolated to the network configuration or MTU handling!?

I am seeking suggestions or insights on what further I can check or configure to resolve these issues. Has anyone faced something similar or has experience with MTU issues in LACP configurations with TP-Link switches?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!