Corosync stopping to working - PVE 6.0.4

phillip.prudencio

New Member
Aug 30, 2019
5
0
1
40
In mMy cluster there is 3 nodes with PVE Version 6.0.4.
I need restart corosync every day for comeback to working it again;
However the Vms don't stop to working;
Follow screenshot.
 

Attachments

  • erro corosync.jpg
    erro corosync.jpg
    227.3 KB · Views: 19
  • erro dmesg.jpg
    erro dmesg.jpg
    249.6 KB · Views: 19
Please save the complete output of journalctl -u corosync in a file and attach it here.
Also your corosync config (/etc/pve/corosync.conf) and your /etc/network/interfaces. If there are any public IPs in there, you might want to mask them.
 
The corosync log lines are cut off, please redirect the output to a file. (journalctl -u corosync > corosync.log)
Is there VM traffic on the same network as corosync?
What's your pve version? (pveversion -v)
 
I sent the files in .zip.
PVEVERSION:
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1


COROSYNC.LOG:
-- Logs begin at Sun 2019-09-22 08:55:50 -03, end at Wed 2019-09-25 08:07:08 -03. --
Sep 22 08:55:56 pve3 systemd[1]: Starting Corosync Cluster Engine...
Sep 22 08:55:56 pve3 corosync[1856]: [MAIN ] Corosync Cluster Engine 3.0.2-dirty starting up
Sep 22 08:55:56 pve3 corosync[1856]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Sep 22 08:55:56 pve3 corosync[1856]: [TOTEM ] Initializing transport (Kronosnet).
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] kronosnet crypto initialized: aes256/sha256
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] totemknet initialized
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cmap
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cfg
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cpg
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] Watchdog not enabled by configuration
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] resource load_15min missing a recovery key.
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] resource memory_used missing a recovery key.
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] no resources configured.
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync watchdog service [7]
Sep 22 08:55:57 pve3 corosync[1856]: [QUORUM] Using quorum provider corosync_votequorum
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: votequorum
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: quorum
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 3 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] A new membership (3:204184) was formed. Members joined: 3
Sep 22 08:55:57 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:55:57 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:55:57 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:55:57 pve3 systemd[1]: Started Corosync Cluster Engine.
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] rx: host: 2 link: 0 is up
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] rx: host: 1 link: 0 is up
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:59 pve3 corosync[1856]: [TOTEM ] A new membership (1:204188) was formed. Members joined: 1 2
Sep 22 08:56:01 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] A new membership (3:204268) was formed. Members left: 1 2
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:05 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:05 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:05 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] A new membership (1:204272) was formed. Members joined: 1 2
Sep 22 08:56:06 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:10 pve3 corosync[1856]: [TOTEM ] A new membership (3:204356) was formed. Members left: 1 2
Sep 22 08:56:10 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:10 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:10 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:10 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:12 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 470 to 1366
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: Global data MTU changed to: 1366
Sep 22 08:56:14 pve3 corosync[1856]: [TOTEM ] A new membership (3:204364) was formed. Members
Sep 22 08:56:14 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:14 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:14 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:17 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:19 pve3 corosync[1856]: [TOTEM ] A new membership (3:204372) was formed. Members
Sep 22 08:56:19 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:19 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:19 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:19 pve3 corosync[1856]: [TOTEM ] A new membership (1:204376) was formed. Members joined: 1 2
Sep 22 08:56:21 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:25 pve3 corosync[1856]: [TOTEM ] A new membership (3:204460) was formed. Members left: 1 2
Sep 22 08:56:25 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:25 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:25 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:25 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide .
 
COROSYNC.CONF:

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.74.200.1
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.74.200.3
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.74.200.5
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: pvecluster
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}

INTERFACE:

# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

allow-vmbr1 eno2
iface eno2 inet manual
ovs_type OVSPort
ovs_bridge vmbr1

auto enp10s0f0
iface enp10s0f0 inet static
address 10.69.0.7
netmask 28
#STG_LAN

iface enp10s0f1 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.74.200.5
netmask 255.255.255.0
gateway 10.74.200.254
bridge-ports eno1
bridge-stp off
bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports eno2
#vm_traffic

I sent file.zip in last post.
 
Hi, I did the update/upgrade to pve version 6.0.7.
I think that it's working now.
Great :).

Note that they are still 1 non fixed corosync crash bug, but it's really more difficult to reproduce. (only 1 or 2 users have this bug).
If you really want to be 100% safe, keep HA disabled until it's fixed.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!