Hey all,
Running into a frustrating issue in my 3-node Proxmox + Ceph cluster.
Despite this, Corosync/KNET keeps dropping the MTU to 8885:
What’s worse: Ceph occasionally gets unstable, OSDs flap, and then the Proxmox watchdog triggers a reboot on one of the nodes. It’s almost like the cluster quorum is lost for a moment due to packet fragmentation or latency spikes.
We haven’t changed anything in the config recently this just started happening.
Anything else I need to check here?
Running into a frustrating issue in my 3-node Proxmox + Ceph cluster.
- Storage network is bonded 10G (802.3ad), VLAN sub-interfaces, MTU set to 9000 end-to-end.
- Management network is separate and left at 1500.
- Switches and NICs are confirmed jumbo-capable.
- On all hosts, ip link shows MTU 9000, and jumbo pings (-M do -s 8972) work fine.
Despite this, Corosync/KNET keeps dropping the MTU to 8885:
Bash:
[KNET ] pmtud: Global data MTU changed to: 8885
What’s worse: Ceph occasionally gets unstable, OSDs flap, and then the Proxmox watchdog triggers a reboot on one of the nodes. It’s almost like the cluster quorum is lost for a moment due to packet fragmentation or latency spikes.
We haven’t changed anything in the config recently this just started happening.
Anything else I need to check here?