Cluster daily broken

cpzengel

Renowned Member
Nov 12, 2015
217
21
83
Aschaffenburg, Germany
zfs.rocks
Hi Guys,

every Day my remote Connected Machine lets the Cluster go bad

Code:
Sep  9 03:39:14 pve9 corosync[14541]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.
Sep  9 03:39:16 pve9 corosync[14541]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.
Sep  9 03:39:17 pve9 corosync[14541]:   [TOTEM ] Token has not been received in 297304 ms
Sep  9 03:39:17 pve9 corosync[14541]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.
Sep  9 03:39:19 pve9 corosync[14541]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.
Sep  9 03:39:20 pve9 corosync[14541]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.
Sep  9 03:39:20 pve9 corosync[14541]:   [TOTEM ] Token has not been received in 300934 ms

Restarting every corosync fixes the Problem.
Even if there is an Interruption of the Network, why doesn´t come back alone?

Both Sides are Bridged by Sophos RED (Remote Ethernet Device)

Cheers

Chriz
 
sorry, I forgot
Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
 
Code:
Sep  9 20:29:47 pve9 corosync[2408]:   [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.

Again broken and did not come back again automaticly.
This Node has been updated before today.
Currently I am updating the other both nodes to

pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
 
and the second time today i have to restart all corosyncs
any idea?

Sep 13 17:13:18 pve1 corosync[45044]: [QUORUM] Members[2]: 1 2

Sep 13 17:13:18 pve1 corosync[45044]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:20 pve1 corosync[45044]: [TOTEM ] A new membership (1:1540484) was formed. Members

Sep 13 17:13:20 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:20 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:20 pve1 corosync[45044]: [QUORUM] Members[2]: 1 2

Sep 13 17:13:20 pve1 corosync[45044]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:20 pve1 corosync[45044]: [TOTEM ] A new membership (1:1540488) was formed. Members

Sep 13 17:13:20 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:20 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:20 pve1 corosync[45044]: [QUORUM] Members[2]: 1 2

Sep 13 17:13:20 pve1 corosync[45044]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:22 pve1 corosync[45044]: [TOTEM ] A new membership (1:1540492) was formed. Members

Sep 13 17:13:22 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:22 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:22 pve1 corosync[45044]: [TOTEM ] A new membership (1:1540496) was formed. Members

Sep 13 17:13:22 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:22 pve1 corosync[45044]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:22 pve1 corosync[45044]: [QUORUM] Members[2]: 1 2

Sep 13 17:13:22 pve1 corosync[45044]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:24 pve1 corosync[45044]: [MAIN ] Node was shut down by a signal

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Unloading all Corosync service engines.

Sep 13 17:13:24 pve1 corosync[45044]: [QB ] withdrawing server sockets

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync vote quorum service v1.0

Sep 13 17:13:24 pve1 corosync[45044]: [QB ] withdrawing server sockets

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync configuration map access

Sep 13 17:13:24 pve1 corosync[45044]: [QB ] withdrawing server sockets

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync configuration service

Sep 13 17:13:24 pve1 corosync[45044]: [QB ] withdrawing server sockets

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01

Sep 13 17:13:24 pve1 corosync[45044]: [QB ] withdrawing server sockets

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync profile loading service

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync resource monitoring service

Sep 13 17:13:24 pve1 corosync[45044]: [SERV ] Service engine unloaded: corosync watchdog service

Sep 13 17:13:25 pve1 corosync[45044]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)

Sep 13 17:13:25 pve1 corosync[45044]: [KNET ] host: host: 2 has no active links

Sep 13 17:13:25 pve1 corosync[45044]: [MAIN ] Corosync Cluster Engine exiting normally

Sep 13 17:13:25 pve1 systemd[1]: corosync.service: Succeeded.

Sep 13 17:13:25 pve1 corosync[55722]: [MAIN ] Corosync Cluster Engine 3.0.2-dirty starting up

Sep 13 17:13:25 pve1 corosync[55722]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow

Sep 13 17:13:25 pve1 corosync[55722]: [TOTEM ] Initializing transport (Kronosnet).

Sep 13 17:13:25 pve1 corosync[55722]: [TOTEM ] kronosnet crypto initialized: aes256/sha256

Sep 13 17:13:25 pve1 corosync[55722]: [TOTEM ] totemknet initialized

Sep 13 17:13:25 pve1 corosync[55722]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync configuration map access [0]

Sep 13 17:13:26 pve1 corosync[55722]: [QB ] server name: cmap

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync configuration service [1]

Sep 13 17:13:26 pve1 corosync[55722]: [QB ] server name: cfg

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]

Sep 13 17:13:26 pve1 corosync[55722]: [QB ] server name: cpg

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync profile loading service [4]

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync resource monitoring service [6]

Sep 13 17:13:26 pve1 corosync[55722]: [WD ] Watchdog not enabled by configuration

Sep 13 17:13:26 pve1 corosync[55722]: [WD ] resource load_15min missing a recovery key.

Sep 13 17:13:26 pve1 corosync[55722]: [WD ] resource memory_used missing a recovery key.

Sep 13 17:13:26 pve1 corosync[55722]: [WD ] no resources configured.

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync watchdog service [7]

Sep 13 17:13:26 pve1 corosync[55722]: [QUORUM] Using quorum provider corosync_votequorum

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]

Sep 13 17:13:26 pve1 corosync[55722]: [QB ] server name: votequorum

Sep 13 17:13:26 pve1 corosync[55722]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]

Sep 13 17:13:26 pve1 corosync[55722]: [QB ] server name: quorum

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 1 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)

Sep 13 17:13:26 pve1 corosync[55722]: [TOTEM ] A new membership (1:1540500) was formed. Members joined: 1

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 2 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 has no active links

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Sep 13 17:13:26 pve1 corosync[55722]: [QUORUM] Members[1]: 1

Sep 13 17:13:26 pve1 corosync[55722]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:26 pve1 corosync[55722]: [KNET ] host: host: 3 has no active links

Sep 13 17:13:27 pve1 corosync[55722]: [KNET ] rx: host: 3 link: 0 is up

Sep 13 17:13:27 pve1 corosync[55722]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Sep 13 17:13:27 pve1 corosync[55722]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397

Sep 13 17:13:27 pve1 corosync[55722]: [KNET ] pmtud: Global data MTU changed to: 1397

Sep 13 17:13:28 pve1 corosync[55722]: [KNET ] rx: host: 2 link: 0 is up

Sep 13 17:13:28 pve1 corosync[55722]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)

Sep 13 17:13:28 pve1 corosync[55722]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397

Sep 13 17:13:29 pve1 corosync[55722]: [TOTEM ] A new membership (1:1540512) was formed. Members joined: 2

Sep 13 17:13:29 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:29 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:29 pve1 corosync[55722]: [QUORUM] This node is within the primary component and will provide service.

Sep 13 17:13:29 pve1 corosync[55722]: [QUORUM] Members[2]: 1 2

Sep 13 17:13:29 pve1 corosync[55722]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:13:31 pve1 corosync[55722]: [TOTEM ] A new membership (1:1540516) was formed. Members joined: 3

Sep 13 17:13:31 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:31 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:31 pve1 corosync[55722]: [CPG ] downlist left_list: 0 received

Sep 13 17:13:31 pve1 corosync[55722]: [QUORUM] Members[3]: 1 2 3

Sep 13 17:13:31 pve1 corosync[55722]: [MAIN ] Completed service synchronization, ready to provide service.

Sep 13 17:14:11 pve1 corosync[55722]: [TOTEM ] Token has not been received in 1237 ms
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!