Corosync stopping to working - PVE 6.0.4

phillip.prudencio

New Member
Aug 30, 2019
5
0
1
40
In mMy cluster there is 3 nodes with PVE Version 6.0.4.
I need restart corosync every day for comeback to working it again;
However the Vms don't stop to working;
Follow screenshot.
 

Attachments

  • erro corosync.jpg
    erro corosync.jpg
    227.3 KB · Views: 19
  • erro dmesg.jpg
    erro dmesg.jpg
    249.6 KB · Views: 19
Please save the complete output of journalctl -u corosync in a file and attach it here.
Also your corosync config (/etc/pve/corosync.conf) and your /etc/network/interfaces. If there are any public IPs in there, you might want to mask them.
 
The corosync log lines are cut off, please redirect the output to a file. (journalctl -u corosync > corosync.log)
Is there VM traffic on the same network as corosync?
What's your pve version? (pveversion -v)
 
I sent the files in .zip.
PVEVERSION:
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1


COROSYNC.LOG:
-- Logs begin at Sun 2019-09-22 08:55:50 -03, end at Wed 2019-09-25 08:07:08 -03. --
Sep 22 08:55:56 pve3 systemd[1]: Starting Corosync Cluster Engine...
Sep 22 08:55:56 pve3 corosync[1856]: [MAIN ] Corosync Cluster Engine 3.0.2-dirty starting up
Sep 22 08:55:56 pve3 corosync[1856]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Sep 22 08:55:56 pve3 corosync[1856]: [TOTEM ] Initializing transport (Kronosnet).
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] kronosnet crypto initialized: aes256/sha256
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] totemknet initialized
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cmap
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cfg
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: cpg
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] Watchdog not enabled by configuration
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] resource load_15min missing a recovery key.
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] resource memory_used missing a recovery key.
Sep 22 08:55:57 pve3 corosync[1856]: [WD ] no resources configured.
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync watchdog service [7]
Sep 22 08:55:57 pve3 corosync[1856]: [QUORUM] Using quorum provider corosync_votequorum
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: votequorum
Sep 22 08:55:57 pve3 corosync[1856]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 22 08:55:57 pve3 corosync[1856]: [QB ] server name: quorum
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 1 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 2 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)
Sep 22 08:55:57 pve3 corosync[1856]: [KNET ] host: host: 3 has no active links
Sep 22 08:55:57 pve3 corosync[1856]: [TOTEM ] A new membership (3:204184) was formed. Members joined: 3
Sep 22 08:55:57 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:55:57 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:55:57 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:55:57 pve3 systemd[1]: Started Corosync Cluster Engine.
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] rx: host: 2 link: 0 is up
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] rx: host: 1 link: 0 is up
Sep 22 08:55:59 pve3 corosync[1856]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Sep 22 08:55:59 pve3 corosync[1856]: [TOTEM ] A new membership (1:204188) was formed. Members joined: 1 2
Sep 22 08:56:01 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] A new membership (3:204268) was formed. Members left: 1 2
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:05 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:05 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:05 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:05 pve3 corosync[1856]: [TOTEM ] A new membership (1:204272) was formed. Members joined: 1 2
Sep 22 08:56:06 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:10 pve3 corosync[1856]: [TOTEM ] A new membership (3:204356) was formed. Members left: 1 2
Sep 22 08:56:10 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:10 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:10 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:10 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:12 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 470 to 1366
Sep 22 08:56:14 pve3 corosync[1856]: [KNET ] pmtud: Global data MTU changed to: 1366
Sep 22 08:56:14 pve3 corosync[1856]: [TOTEM ] A new membership (3:204364) was formed. Members
Sep 22 08:56:14 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:14 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:14 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:17 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:19 pve3 corosync[1856]: [TOTEM ] A new membership (3:204372) was formed. Members
Sep 22 08:56:19 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:19 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:19 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 08:56:19 pve3 corosync[1856]: [TOTEM ] A new membership (1:204376) was formed. Members joined: 1 2
Sep 22 08:56:21 pve3 corosync[1856]: [TOTEM ] FAILED TO RECEIVE
Sep 22 08:56:25 pve3 corosync[1856]: [TOTEM ] A new membership (3:204460) was formed. Members left: 1 2
Sep 22 08:56:25 pve3 corosync[1856]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Sep 22 08:56:25 pve3 corosync[1856]: [CPG ] downlist left_list: 0 received
Sep 22 08:56:25 pve3 corosync[1856]: [QUORUM] Members[1]: 3
Sep 22 08:56:25 pve3 corosync[1856]: [MAIN ] Completed service synchronization, ready to provide .
 
COROSYNC.CONF:

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.74.200.1
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.74.200.3
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.74.200.5
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: pvecluster
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}

INTERFACE:

# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

allow-vmbr1 eno2
iface eno2 inet manual
ovs_type OVSPort
ovs_bridge vmbr1

auto enp10s0f0
iface enp10s0f0 inet static
address 10.69.0.7
netmask 28
#STG_LAN

iface enp10s0f1 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.74.200.5
netmask 255.255.255.0
gateway 10.74.200.254
bridge-ports eno1
bridge-stp off
bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports eno2
#vm_traffic

I sent file.zip in last post.
 
Hi, I did the update/upgrade to pve version 6.0.7.
I think that it's working now.
Great :).

Note that they are still 1 non fixed corosync crash bug, but it's really more difficult to reproduce. (only 1 or 2 users have this bug).
If you really want to be 100% safe, keep HA disabled until it's fixed.