Watchdog reboots on Proxmox cluster due to Ceph/Corosync MTU weirdness (drops to 8885)

Johnlwc

New Member
Oct 3, 2025
2
1
3
Hey all,

Running into a frustrating issue in my 3-node Proxmox + Ceph cluster.

  • Storage network is bonded 10G (802.3ad), VLAN sub-interfaces, MTU set to 9000 end-to-end.

  • Management network is separate and left at 1500.

  • Switches and NICs are confirmed jumbo-capable.

  • On all hosts, ip link shows MTU 9000, and jumbo pings (-M do -s 8972) work fine.

Despite this, Corosync/KNET keeps dropping the MTU to 8885:

Bash:
[KNET  ] pmtud: Global data MTU changed to: 8885



What’s worse: Ceph occasionally gets unstable, OSDs flap, and then the Proxmox watchdog triggers a reboot on one of the nodes. It’s almost like the cluster quorum is lost for a moment due to packet fragmentation or latency spikes.

We haven’t changed anything in the config recently this just started happening.

Anything else I need to check here?
 
Hey,
some guessing: KNET reports the data MTU, meaning after deduction of all required headers and padding.
When I look at my test cluster KNET reports 1397 as data MTU. This makes an overhead of 103 bytes. Maybe KNET does align the data at 64 byte boundary, that's why data MTU in your example is 12 bytes lower.
You can work yourself through the code, it is all on GitHub: https://github.com/kronosnet/kronosnet

If the PVE node is rebooting an analysis is possible only after providing at least information about network setup, content of corosync.conf and of course some logs, e. g. of corosync.service before reboot.
 
  • Like
Reactions: jsterr
Hey,
some guessing: KNET reports the data MTU, meaning after deduction of all required headers and padding.
When I look at my test cluster KNET reports 1397 as data MTU. This makes an overhead of 103 bytes. Maybe KNET does align the data at 64 byte boundary, that's why data MTU in your example is 12 bytes lower.
You can work yourself through the code, it is all on GitHub: https://github.com/kronosnet/kronosnet

If the PVE node is rebooting an analysis is possible only after providing at least information about network setup, content of corosync.conf and of course some logs, e. g. of corosync.service before reboot.
Thank you for your reply. After further debugging, I discovered that an OpenStack node had taken the same IP address as a Proxmox node. This caused the other two servers to compete for master status. Additionally, Corosync traffic was running over the same link as the storage network, which we have now corrected.
 
  • Like
Reactions: fba