corosync modify jumbo problem

gz_jax · Apr 9, 2020

The mtu of the current pve server cluster is 1500 by default. We want to change the mtu of the pve cluster to 9000. We migrate the virtual machines one by one in the test environment to modify the mtu. The cluster and ceph are all normal, but in the production environment we found the node after modification. Corosync will be kicked out of the cluster, but the network can communicate normally, including the ceph cluster, and we can change it back to the cluster after the mtu is 1500. The difference between the test environment and the production environment is different in hardware and quantity. Others are basically the same. For example, the switch used in the test environment is an RJ45 network, the pve management and ceph share a bonding port, the production environment is fiber, and the pve management and ceph bonding port are separate. How can we troubleshoot this problem?

wolfgang · Apr 16, 2020

Hi,

We do not recommend to use the same network for ceph and corosync.
If you have this in combination with HA it will interrupt your services.
The problem is if you have too much traffic on the ceph network, corosync is not able to get the needed low latency.
This is a problem for the quorum that is needed for a stable working cluster.

Search

Search

corosync modify jumbo problem

gz_jax

New Member

wolfgang

Proxmox Retired Staff

We value your privacy