Hello all,
I recently experience a problem with corosync showing link flapping, but it seems to me that these are really fake. Neither the corresponding switch shows a link problem, nor the kernels of the boxes (3-box cluster). I use a 10G fiber main links and 1G copper backup links. Flapping is shown on the copper links.
Is this kind of a timing problem with corosync?
example log:
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] rx: host: 2 link: 1 is up
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] link: Resetting MTU for link 1 because host 2 joined
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] pmtud: Global data MTU changed to: 1397
Apr 01 17:15:26 pm-248 corosync[2090]: [KNET ] link: host: 2 link: 1 is down
Apr 01 17:15:26 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] rx: host: 2 link: 1 is up
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] link: Resetting MTU for link 1 because host 2 joined
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] pmtud: Global data MTU changed to: 1397
I recently experience a problem with corosync showing link flapping, but it seems to me that these are really fake. Neither the corresponding switch shows a link problem, nor the kernels of the boxes (3-box cluster). I use a 10G fiber main links and 1G copper backup links. Flapping is shown on the copper links.
Is this kind of a timing problem with corosync?
example log:
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] rx: host: 2 link: 1 is up
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] link: Resetting MTU for link 1 because host 2 joined
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:13:13 pm-248 corosync[2090]: [KNET ] pmtud: Global data MTU changed to: 1397
Apr 01 17:15:26 pm-248 corosync[2090]: [KNET ] link: host: 2 link: 1 is down
Apr 01 17:15:26 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] rx: host: 2 link: 1 is up
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] link: Resetting MTU for link 1 because host 2 joined
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 01 17:15:28 pm-248 corosync[2090]: [KNET ] pmtud: Global data MTU changed to: 1397
