Watchdog reboots on Proxmox cluster due to Ceph/Corosync MTU weirdness (drops to 8885)

Johnlwc

New Member
Oct 3, 2025
1
0
1
Hey all,

Running into a frustrating issue in my 3-node Proxmox + Ceph cluster.

  • Storage network is bonded 10G (802.3ad), VLAN sub-interfaces, MTU set to 9000 end-to-end.

  • Management network is separate and left at 1500.

  • Switches and NICs are confirmed jumbo-capable.

  • On all hosts, ip link shows MTU 9000, and jumbo pings (-M do -s 8972) work fine.

Despite this, Corosync/KNET keeps dropping the MTU to 8885:

Bash:
[KNET  ] pmtud: Global data MTU changed to: 8885



What’s worse: Ceph occasionally gets unstable, OSDs flap, and then the Proxmox watchdog triggers a reboot on one of the nodes. It’s almost like the cluster quorum is lost for a moment due to packet fragmentation or latency spikes.

We haven’t changed anything in the config recently this just started happening.

Anything else I need to check here?