Corosync service wont start after update

Donner

Member
Aug 22, 2021
18
2
8
40
Hi, there is some starnge things with corosync after last update of my 4 nodes cluster, one of nodes wonts start corosync.service
1656626643345.png

so vms wont start, cos no quorum, other nodes inaccessive from web gui if i logged from that node etc.
systemctl restart corosync.service gives no effect, but if i exece corosync in shell - all starts working!
wtf?
 
Hi,

can you please provide us with the output of:

Bash:
pveversion -v
grep '' /etc/apt/sources.list &&  grep '' /etc/apt/sources.list.d/*
cat /var/log/apt/history.log
 
Code:
root@ibm:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.2-5 (running version: 7.2-5/12f1e639)
pve-kernel-5.15: 7.2-5
pve-kernel-helper: 7.2-5
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-5
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Code:
root@ibm:~# grep '' /etc/apt/sources.list &&  grep '' /etc/apt/sources.list.d/*
deb http://ftp.ru.debian.org/debian bullseye main contrib

deb http://ftp.ru.debian.org/debian bullseye-updates main contrib

# security updates
deb http://security.debian.org bullseye-security main contrib

deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription

# deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise

Code:
root@ibm:~# cat /var/log/apt/history.log

Start-Date: 2022-07-01  01:04:45
Commandline: apt-get dist-upgrade
Install: pve-kernel-5.15.35-3-pve:amd64 (5.15.35-6, automatic), pve-kernel-5.15:amd64 (7.2-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10), cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1), libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2), pve-docs:amd64 (7.1-2, 7.2-2), proxmox-widget-toolkit:amd64 (3.4-10, 3.5.1), pve-firmware:amd64 (3.4-1, 3.4-2), tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4), pve-qemu-kvm:amd64 (6.2.0-5, 6.2.0-10), libpve-cluster-api-perl:amd64 (7.1-3, 7.2-1), pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1), libproxmox-backup-qemu0:amd64 (1.2.0-1, 1.3.1-1), libpve-storage-perl:amd64 (7.1-2, 7.2-5), libxml2:amd64 (2.9.10+dfsg-6.7+deb11u1, 2.9.10+dfsg-6.7+deb11u2), pve-cluster:amd64 (7.1-3, 7.2-1), libldap-2.4-2:amd64 (2.4.57+dfsg-3, 2.4.57+dfsg-3+deb11u1), libproxmox-rs-perl:amd64 (0.1.0, 0.1.1), proxmox-ve:amd64 (7.1-2, 7.2-1), proxmox-backup-file-restore:amd64 (2.1.6-1, 2.2.3-1), qemu-server:amd64 (7.1-5, 7.2-3), libpve-access-control:amd64 (7.1-7, 7.2-2), pve-container:amd64 (4.1-5, 4.2-1), pve-i18n:amd64 (2.6-3, 2.7-2), rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1), proxmox-backup-client:amd64 (2.1.6-1, 2.2.3-1), libpve-http-server-perl:amd64 (4.1-1, 4.1-2), libssl1.1:amd64 (1.1.1n-0+deb11u1, 1.1.1n-0+deb11u3), pve-manager:amd64 (7.1-12, 7.2-5), libpve-common-perl:amd64 (7.1-5, 7.2-2), libnozzle1:amd64 (1.22-pve2, 1.24-pve1), libknet1:amd64 (1.22-pve2, 1.24-pve1), pve-kernel-helper:amd64 (7.2-2, 7.2-5), openssl:amd64 (1.1.1n-0+deb11u1, 1.1.1n-0+deb11u3), libpve-cluster-perl:amd64 (7.1-3, 7.2-1)
End-Date: 2022-07-01  01:06:54
 
tried update another node - all works fine, even with old kernel (win22 bug)
syslog when corosync trying start as a service


Code:
ul 04 01:11:34 ibm pve-firewall[10730]: starting server
Jul 04 01:11:34 ibm systemd[1]: Started Proxmox VE firewall.
Jul 04 01:11:34 ibm pvestatd[10732]: starting server
Jul 04 01:11:34 ibm systemd[1]: Started PVE Status Daemon.
Jul 04 01:11:34 ibm kernel: bpfilter: Loaded bpfilter_umh pid 10735
Jul 04 01:11:34 ibm unknow: Started bpfilter
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-Hms9TZ/qb-request-quorum-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-RloObY/qb-request-cmap-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-yL0pu0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 12
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-DOgCk0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:37 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-8ecJPY/qb-request-quorum-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-Jb4X9V/qb-request-cmap-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-TFXIg0/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 12
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't open file /dev/shm/qb-10643-10415-28-JGKC7Y/qb-request-cpg-header: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: couldn't create file for mmap
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: qb_rb_open:REQUEST: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [libqb] error: connection failed: No such file or directory (2)
Jul 04 01:11:43 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 12
Jul 04 01:11:46 ibm kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
Jul 04 01:11:47 ibm kernel: FS-Cache: Loaded
Jul 04 01:11:47 ibm kernel: FS-Cache: Netfs 'nfs' registered for caching
Jul 04 01:11:47 ibm kernel: NFS: Registering the id_resolver key type
Jul 04 01:11:47 ibm kernel: Key type id_resolver registered
Jul 04 01:11:47 ibm kernel: Key type id_legacy registered
Jul 04 01:11:47 ibm systemd[1]: Reached target Host and Network Name Lookups.
Jul 04 01:11:48 ibm systemd[1]: Starting Preprocess NFS configuration...
Jul 04 01:11:48 ibm systemd[1]: nfs-config.service: Succeeded.
Jul 04 01:11:48 ibm systemd[1]: Finished Preprocess NFS configuration.
Jul 04 01:11:48 ibm systemd[1]: Starting Notify NFS peers of a restart...
Jul 04 01:11:48 ibm systemd[1]: Starting NFS status monitor for NFSv2/3 locking....

but if i just strat corosync from shell all start working


Jul 04 01:15:15 ibm login[17450]: ROOT LOGIN on '/dev/pts/0'
Jul 04 01:15:19 ibm pmxcfs[10415]: [quorum] crit: quorum_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [confdb] crit: cmap_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [dcdb] crit: cpg_initialize failed: 2
Jul 04 01:15:19 ibm pmxcfs[10415]: [status] crit: cpg_initialize failed: 2
Jul 04 01:15:20 ibm corosync[17505]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up
Jul 04 01:15:20 ibm corosync[17505]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jul 04 01:15:20 ibm corosync[17506]: [TOTEM ] Initializing transport (Kronosnet).
Jul 04 01:15:20 ibm corosync[17506]: [TOTEM ] totemknet initialized
Jul 04 01:15:20 ibm corosync[17506]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cmap
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync configuration service [1]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cfg
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: cpg
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync profile loading service [4]
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Jul 04 01:15:21 ibm corosync[17506]: [WD ] Watchdog not enabled by configuration
Jul 04 01:15:21 ibm corosync[17506]: [WD ] resource load_15min missing a recovery key.
Jul 04 01:15:21 ibm corosync[17506]: [WD ] resource memory_used missing a recovery key.
Jul 04 01:15:21 ibm corosync[17506]: [WD ] no resources configured.
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync watchdog service [7]
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Using quorum provider corosync_votequorum
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: votequorum
Jul 04 01:15:21 ibm corosync[17506]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jul 04 01:15:21 ibm corosync[17506]: [QB ] server name: quorum
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] Configuring link 0
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] Configured link number 0: local addr: 10.0.15.1, port=5405
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 3 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 4 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Sync members[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Sync joined[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [TOTEM ] A new membership (1.20a) was formed. Members joined: 1
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 2 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)
Jul 04 01:15:21 ibm corosync[17506]: [KNET ] host: host: 1 has no active links
Jul 04 01:15:21 ibm corosync[17506]: [QUORUM] Members[1]: 1
Jul 04 01:15:21 ibm corosync[17506]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 2 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 4 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] rx: host: 3 link: 0 is up
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Jul 04 01:15:23 ibm corosync[17506]: [KNET ] pmtud: Global data MTU changed to: 1397
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Sync members[4]: 1 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Sync joined[3]: 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [TOTEM ] A new membership (1.215) was formed. Members joined: 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] This node is within the primary component and will provide service.
Jul 04 01:15:24 ibm corosync[17506]: [QUORUM] Members[4]: 1 2 3 4
Jul 04 01:15:24 ibm corosync[17506]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: update cluster info (cluster name Neftpk, version = 4)
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: node has quorum
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: members: 1/10415, 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: starting data syncronisation
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: received sync request (epoch 1/10415/00000001)
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: members: 1/10415, 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: starting data syncronisation
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: received sync request (epoch 1/10415/00000001)
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: received all states
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: leader is 2/2142
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: synced members: 2/2142, 3/4052, 4/2852
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: waiting for updates from leader
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: update complete - trying to commit (got 4 inode updates)
Jul 04 01:15:25 ibm pmxcfs[10415]: [dcdb] notice: all data is up to date
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: received all states
Jul 04 01:15:25 ibm pmxcfs[10415]: [status] notice: all data is up to date
Jul 04 01:15:25 ibm pvesh[15651]: got quorum
Jul 04 01:15:25 ibm pvesh[15651]: Starting VM 102
 
figured out. Thats my fall. I'm using "zfs autobackup" and forgot to clear mountpoints, so root fs was mounted multiple time. After clearing mountpoints all start working.