Corosync service wont start after update

Donner

Member
Aug 22, 2021
17
2
8
39
Hi, there is some starnge things with corosync after last update of my 4 nodes cluster, one of nodes wonts start corosync.service
1656626643345.png

so vms wont start, cos no quorum, other nodes inaccessive from web gui if i logged from that node etc.
systemctl restart corosync.service gives no effect, but if i exece corosync in shell - all starts working!
wtf?
 
Hi,

can you please provide us with the output of:

Bash:
pveversion -v
grep '' /etc/apt/sources.list &&  grep '' /etc/apt/sources.list.d/*
cat /var/log/apt/history.log
 
Code:
root@ibm:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.2-5 (running version: 7.2-5/12f1e639)
pve-kernel-5.15: 7.2-5
pve-kernel-helper: 7.2-5
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-5
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Code:
root@ibm:~# grep '' /etc/apt/sources.list &&  grep '' /etc/apt/sources.list.d/*
deb http://ftp.ru.debian.org/debian bullseye main contrib

deb http://ftp.ru.debian.org/debian bullseye-updates main contrib

# security updates
deb http://security.debian.org bullseye-security main contrib

deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription

# deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise

Code:
root@ibm:~# cat /var/log/apt/history.log

Start-Date: 2022-07-01  01:04:45
Commandline: apt-get dist-upgrade
Install: pve-kernel-5.15.35-3-pve:amd64 (5.15.35-6, automatic), pve-kernel-5.15:amd64 (7.2-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10), cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1), libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2), pve-docs:amd64 (7.1-2, 7.2-2), proxmox-widget-toolkit:amd64 (3.4-10, 3.5.1), pve-firmware:amd64 (3.4-1, 3.4-2), tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4), pve-qemu-kvm:amd64 (6.2.0-5, 6.2.0-10), libpve-cluster-api-perl:amd64 (7.1-3, 7.2-1), pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1), libproxmox-backup-qemu0:amd64 (1.2.0-1, 1.3.1-1), libpve-storage-perl:amd64 (7.1-2, 7.2-5), libxml2:amd64 (2.9.10+dfsg-6.7+deb11u1, 2.9.10+dfsg-6.7+deb11u2), pve-cluster:amd64 (7.1-3, 7.2-1), libldap-2.4-2:amd64 (2.4.57+dfsg-3, 2.4.57+dfsg-3+deb11u1), libproxmox-rs-perl:amd64 (0.1.0, 0.1.1), proxmox-ve:amd64 (7.1-2, 7.2-1), proxmox-backup-file-restore:amd64 (2.1.6-1, 2.2.3-1), qemu-server:amd64 (7.1-5, 7.2-3), libpve-access-control:amd64 (7.1-7, 7.2-2), pve-container:amd64 (4.1-5, 4.2-1), pve-i18n:amd64 (2.6-3, 2.7-2), rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1), proxmox-backup-client:amd64 (2.1.6-1, 2.2.3-1), libpve-http-server-perl:amd64 (4.1-1, 4.1-2), libssl1.1:amd64 (1.1.1n-0+deb11u1, 1.1.1n-0+deb11u3), pve-manager:amd64 (7.1-12, 7.2-5), libpve-common-perl:amd64 (7.1-5, 7.2-2), libnozzle1:amd64 (1.22-pve2, 1.24-pve1), libknet1:amd64 (1.22-pve2, 1.24-pve1), pve-kernel-helper:amd64 (7.2-2, 7.2-5), openssl:amd64 (1.1.1n-0+deb11u1, 1.1.1n-0+deb11u3), libpve-cluster-perl:amd64 (7.1-3, 7.2-1)
End-Date: 2022-07-01  01:06:54
 
tried update another node - all works fine, even with old kernel (win22 bug)
syslog when corosync trying start as a service


Code:
ul 04 01:11:34 ibm pve-firewall[10730]: starting server
Jul 04 01:11:34 ibm systemd[1]: Started Proxmox VE firewall.
Jul 04 01:11:34 ibm pvestatd[10732]: starting server
Jul 04 01:11:34 ibm systemd[1]: Started PVE Status Daemon.
Jul 04 01:11:34 ibm kernel: bpfilter: Loaded bpfilter_umh pid 10735
Jul 04 01:11:34 ibm unknow: Started bpfilter
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-Hms9TZ/qb-request-quorum-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-RloObY/qb-request-cmap-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-yL0pu0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-DOgCk0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-8ecJPY/qb-request-quorum-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-Jb4X9V/qb-request-cmap-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-TFXIg0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-JGKC7Y/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 12
Jul 04 01:11:46 ibm kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
Jul 04 01:11:47 ibm kernel: FS-Cache: Loaded
Jul 04 01:11:47 ibm kernel: FS-Cache: Netfs 'nfs' registered for caching
Jul 04 01:11:47 ibm kernel: NFS: Registering the id_resolver key type
Jul 04 01:11:47 ibm kernel: Key type id_resolver registered
Jul 04 01:11:47 ibm kernel: Key type id_legacy registered
Jul 04 01:11:47 ibm systemd[1]: Reached target Host and Network Name Lookups.
Jul 04 01:11:48 ibm systemd[1]: Starting Preprocess NFS configuration...
Jul 04 01:11:48 ibm systemd[1]: nfs-config.service: Succeeded.
Jul 04 01:11:48 ibm systemd[1]: Finished Preprocess NFS configuration.
Jul 04 01:11:48 ibm systemd[1]: Starting Notify NFS peers of a restart...
Jul 04 01:11:48 ibm systemd[1]: Starting NFS status monitor for NFSv2/3 locking....

but if i just strat corosync from shell all start working


Jul 04 01:15:15 ibm login[17450]: ROOT LOGIN on '/dev/pts/0'
Jul 04 01:15:19 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 2
Jul 04 01:15:20 ibm corosync[17505]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up
Jul 04 01:15:20 ibm corosync[17505]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jul 04 01:15:20 ibm corosync[17506]: [TOTEM ] Initializing transport (Kronosnet).
Jul 04 01:15:20 ibm corosync[17506]: [TOTEM ] totemknet initialized
Jul 04 01:15:20 ibm corosync[17506]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cmap
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync configuration service [1]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cfg
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cpg
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync profile loading service [4]
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Jul 04 01:15:21 ibm corosync[17506]: [WD ] Watchdog not enabled by configuration
Jul 04 01:15:21 ibm corosync[17506]: [WD ] resource load_15min missing a recovery key.
Jul 04 01:15:21 ibm corosync[17506]: [WD ] resource memory_used missing a recovery key.
Jul 04 01:15:21 ibm corosync[17506]: [WD ] no resources configured.
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync watchdog service [7]
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Using quorum provider corosync_votequorum
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: votequorum
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: quorum
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] Configuring link 0
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] Configured link number 0: local addr: 10.0.15.1, port=5405
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Sync members[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Sync joined[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] A new membership (1.20a) was formed. Members joined: 1
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 1 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Members[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 2 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 4 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 3 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: Global data MTU changed to: 1397
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Sync members[4]: 1 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Sync joined[3]: 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [TOTEM ] A new membership (1.215) was formed. Members joined: 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] This node is within the primary component and will provide service.
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Members[4]: 1 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: update cluster info (cluster name Neftpk, version = 4)
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: node has quorum
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: members: 1/10415, 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: starting data syncronisation
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: received sync request (epoch 1/10415/00000001)
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: members: 1/10415, 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: starting data syncronisation
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: received sync request (epoch 1/10415/00000001)
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: received all states
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: leader is 2/2142
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: synced members: 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: waiting for updates from leader
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: update complete - trying to commit (got 4 inode updates)
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: all data is up to date
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: received all states
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: all data is up to date
Jul 04 01:15:25 ibm pvesh[15651]: got quorum
Jul 04 01:15:25 ibm pvesh[15651]: Starting VM 102
 
figured out. Thats my fall. I'm using "zfs autobackup" and forgot to clear mountpoints, so root fs was mounted multiple time. After clearing mountpoints all start working.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!