Cluster in weird state ( nodes with grey question mark)

BenDDD

Member
Nov 28, 2019
59
1
11
40
Hello everyone,

Hello everyone,

I am experiencing a strange situation with my cluster. 22 out of 24 nodes seem to communicate correctly via corosync but they appear with a gray question mark on the WebUI:

Code:
pvecm status
user config - ignore invalid group member 'mathieu-adm'
Cluster information
-------------------
Name:             galaxie
Config Version:   73
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Sep 13 02:57:46 2020
Quorum provider:  corosync_votequorum
Nodes:            22
Node ID:          0x00000001
Ring ID:          1.182e9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   24
Highest expected: 24
Total votes:      22
Quorum:           13 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 147.215.130.101 (local)
0x00000002          1 147.215.130.102
0x00000003          1 147.215.130.103
0x00000004          1 147.215.130.104
0x00000005          1 147.215.130.105
0x00000006          1 147.215.130.106
0x00000007          1 147.215.130.107
0x00000008          1 147.215.130.108
0x00000009          1 147.215.130.109
0x0000000a          1 147.215.130.110
0x0000000b          1 147.215.130.111
0x0000000c          1 147.215.130.112
0x0000000d          1 147.215.130.113
0x0000000e          1 147.215.130.114
0x0000000f          1 147.215.130.115
0x00000010          1 147.215.130.116
0x00000011          1 147.215.130.117
0x00000012          1 147.215.130.118
0x00000013          1 147.215.130.119
0x00000014          1 147.215.130.120
0x00000015          1 147.215.130.121
0x00000016          1 147.215.130.122

cluster.png

And as you can see, two other nodes do not appear in the corosync sync and have a white cross on a red background on the WebUI.

Some information that may be useful :

proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Sep 13 02:59:14 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3261 ms
Sep 13 02:59:27 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms
Sep 13 02:59:51 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3261 ms
Sep 13 03:00:12 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms
Sep 13 03:00:44 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms
Sep 13 03:01:19 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 14452 ms
Sep 13 03:02:19 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms
Sep 13 03:02:43 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms
Sep 13 03:03:46 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3261 ms
Sep 13 03:04:14 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3261 ms
Sep 13 03:04:26 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3286 ms
Sep 13 03:04:54 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3261 ms
Sep 13 03:05:16 galaxie1 corosync[10908]: [TOTEM ] Token has not been received in 3283 ms

root@galaxie23:~# journalctl -r -u corosync
-- Logs begin at Sun 2020-09-13 01:39:48 CEST, end at Sun 2020-09-13 03:12:27 CEST. --
Sep 13 03:12:27 galaxie23 corosync[13119]: [MAIN ] Completed service synchronization, ready to provid
Sep 13 03:12:27 galaxie23 corosync[13119]: [QUORUM] Members[7]: 9 10 12 13 16 23 25
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 0 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 1 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [TOTEM ] A new membership (9.184f0) was formed. Members joi
Sep 13 03:12:27 galaxie23 corosync[13119]: [MAIN ] Completed service synchronization, ready to provid
Sep 13 03:12:27 galaxie23 corosync[13119]: [QUORUM] Members[1]: 23
Sep 13 03:12:27 galaxie23 corosync[13119]: [CPG ] downlist left_list: 0 received
Sep 13 03:12:27 galaxie23 corosync[13119]: [TOTEM ] A new membership (17.184ec) was formed. Members
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: Global data MTU changed to: 1397
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 10 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 19 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 20 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 21 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 22 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 25 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 11 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 9 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 12 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 13 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 14 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 15 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 16 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 17 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] pmtud: PMTUD link change for host: 18 link: 0 from
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 13 03:12:05 galaxie23 corosync[13119]: [KNET ] host: host: 19 (passive) best link: 0 (pri: 1)

Thank you in advance for your help.
 
Something else that I just realized. I have a corosync process that is running at 100%:

htop.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!