Corosync 100% CPU load [solved]

SellerOfSmiles

New Member
Feb 29, 2024
5
0
1
Hi. After install PVE 8.0.3 on new node and add it in my cluster (other nodes have versions PVE 7.*-***) Corosync useing 100% CPU in one thread.

Why he do it wrong? :) Or maybe I do something wrong?..

Code:
# apt list corosync
Listing... Done
corosync/now 3.1.7-pve3 amd64 [installed,local]

Code:
c# journalctl -u corosync -f -n 30
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 2 has no active links
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 0)
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 has no active links
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 has no active links
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 has no active links
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [KNET  ] link: Resetting MTU for link 0 because host 4 joined
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [QUORUM] Sync members[1]: 4
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [QUORUM] Sync joined[1]: 4
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [TOTEM ] A new membership (4.4f32) was formed. Members joined: 4
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [QUORUM] Members[1]: 4
Feb 29 11:54:07 S-VIRT04 corosync[433373]:   [MAIN  ] Completed service synchronization, ready to provide service.
Feb 29 11:54:07 S-VIRT04 systemd[1]: Started corosync.service - Corosync Cluster Engine.
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] rx: host: 3 link: 0 is up
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] rx: host: 1 link: 0 is up
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [QUORUM] Sync members[4]: 1 2 3 4
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [QUORUM] Sync joined[3]: 1 2 3
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [TOTEM ] A new membership (1.4f36) was formed. Members joined: 1 2 3
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [QUORUM] This node is within the primary component and will provide service.
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [QUORUM] Members[4]: 1 2 3 4
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [MAIN  ] Completed service synchronization, ready to provide service.
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 453 to 65397
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 453 to 65397
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] pmtud: Global data MTU changed to: 65397

Code:
# htop

    0[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||95.0%] Tasks: 58, 39 thr, 131 kthr; 3 running
    1[||||||                                                                                5.0%] Load average: 2.13 2.06 1.95
    2[|||||                                                                                 4.1%] Uptime: 2 days, 03:51:21
    3[|||||                                                                                 4.5%]
  Mem[||||||||||||                                                                   2.39G/31.1G]
  Swp[                                                                                  0K/8.00G]

  [Main] [I/O]
    PID USER       PRI  NI  VIRT   RES   SHR S  CPU%▽MEM%   TIME+  Command
 433373 root        RT   0  676M  165M 53084 S  96.3  0.5  2h01:17 /usr/sbin/corosync -f
 433380 root        RT   0  676M  165M 53084 R  95.3  0.5  2h00:12 /usr/sbin/corosync -f
   1220 root        20   0 2935M 1168M 19840 S   6.6  3.7  2h44:52 /usr/bin/kvm -id 112...
 
Can you post your /etc/network/interfaces file please?
 
Code:
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address x.x.x.19/24
        gateway x.x.x.254
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
 
Where does this MTU come from?


Code:
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 453 to 65397
Feb 29 11:54:16 S-VIRT04 corosync[433373]:   [KNET  ] pmtud: Global data MTU changed to: 65397

This looks wrong, should look more like this (depending on if you use jumbo-frames on one of your links or not):

Code:
Feb 28 13:29:50 PMX4 corosync[2498]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Feb 28 13:29:50 PMX4 corosync[2498]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 1 from 469 to 8885
Feb 28 13:29:50 PMX4 corosync[2498]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Feb 28 13:29:50 PMX4 corosync[2498]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 469 to 8885
Feb 28 13:29:50 PMX4 corosync[2498]:   [KNET  ] pmtud: Global data MTU changed to: 1397
 
I'm not understend why "Global data MTU changed to: 65397". It's still like that.
But, when i set netmtu: 1400 in corosync.conf and reboot Corosync on all nodes CPU utilization has decreased.

o_O
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!