[SOLVED] New cluster - Ceph = got timeout (500)

kez

Member
Mar 26, 2023
85
12
13
Hey, please can someone point me in the right direction?

4 nodes all installed with PVE ISO. So no firewall in play.

Each node has a Ceph network:

Code:
auto bond1
iface bond1 inet static
        address 10.10.10.1/24 (node1 10.10.10.1/24 node2 10.10.10.2/24 node3 10.10.10.3/24 etc.)
        bond-slaves ens1f2np2 ens1f3np3
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3
        mtu 9216
#Ceph

All nodes can ping each other on 10.10.10.x

Browse to Datacenter > Node1 > Ceph > Install (default 18.2 reff) > configure > network 10.10.10.0/24 > got timeout (500)

I have rebuilt all 4 nodes several times but each time the same issue, even on Ceph v19.

I even followed this guide just using my IPs: https://www.proxmox.com/en/services...environment/install-ceph-server-on-proxmox-ve and https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster

Thanks for any help.
 
Lowered the MTU from Jumbo 9216 down to the default 1500 and Ceph is working, Is there a doc on this? I thought Jumbo frames was preferred?
 
Indeed, the switch is set i,e `set interface ea2 mtu 9000`. It's an odd one for sure.
 
Since 9216 > 9000 did you try 9000? :) or something smaller.

Yes, I did try 9000 also..

Also tried it on bond1 only, bond1 and the 2 slaves ens1f2np2, ens1f3np3, and not on bond1 but only on the slaves ens1f2np2, ens1f3np3.

Same issue each time.

Very confusing :-/
 
Fixed:

Switch MTU 9216

Servers bond1 MTU 9000
Severs bond1 slaves ens1f2np2, ens1f3np3 MTU 9000

Got there in the end :)
 
  • Like
Reactions: ucholak