Whole cluster randomly rebooted twice (maybe corosync?)

misterS

Member
May 10, 2019
2
0
6
25
Germany
schlieper.tech
Dear all,

I am currently dealing with a problem in a hyperconverged (ceph) where the whole cluster reboots seemingly at random. Every single one (of the total of seven) node resets at the same time. I am suspecting corosync to not be able to communicate properly. This problem has only popped up after the newest upgrade. The cluster boots normally after the hard reset. Any kind of help would be highly appreciated.

proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

root@PX-LI-04:~# cat /var/log/syslog | grep corosync
Jan 28 08:13:48 PX-LI-04 corosync[1831]: [QUORUM] Members[7]: 1 2 3 4 5 6 7
Jan 28 08:13:48 PX-LI-04 corosync[1831]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] link: host: 5 link: 0 is down
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 has no active links
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] rx: host: 5 link: 0 is up
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Jan 28 08:13:53 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:13:57 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 3442 ms
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [TOTEM ] A new membership (1.46c9) was formed. Members
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [QUORUM] Members[7]: 1 2 3 4 5 6 7
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 28 08:14:07 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 153 ms
Jan 28 08:14:12 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:14:19 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 155 ms
Jan 28 08:14:19 PX-LI-04 corosync[1831]: [TOTEM ] A new membership (1.46d5) was formed. Members
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:14:32 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:14:32 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:40 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 155 ms
Jan 28 08:14:54 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] link: host: 3 link: 0 is down
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 has no active links
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 2176 ms
Jan 28 08:14:59 PX-LI-04 corosync[1831]: [TOTEM ] A processor failed, forming new configuration.
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] rx: host: 3 link: 0 is up
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:05 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:06 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:07 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:07 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:18:01 PX-LI-04 corosync[1817]: [MAIN ] Corosync Cluster Engine 3.0.4 starting up
Jan 28 08:18:01 PX-LI-04 corosync[1817]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Jan 28 08:18:01 PX-LI-04 corosync[1817]: [TOTEM ] Initializing transport (Kronosnet).
Jan 28 08:18:02 PX-LI-04 corosync[1817]: [TOTEM ] kronosnet crypto initialized: aes256/sha256
Jan 28 08:18:02 PX-LI-04 corosync[1817]: [TOTEM ] totemknet initialized
Jan 28 08:18:02 PX-LI-04 corosync[1817]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jan 28 08:18:02 PX-LI-04 corosync[1817]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jan 28 08:18:02 PX-LI-04 corosync[1817]: [QB ] server name: cmap
 

misterS

Member
May 10, 2019
2
0
6
25
Germany
schlieper.tech
root@PX-LI-04:~# cat /var/log/syslog
Jan 28 08:08:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:08:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:09:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:09:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:09:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:09:52 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 7b67
Jan 28 08:10:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:10:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:10:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:11:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:11:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:11:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:12:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:12:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:12:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:12:08 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:10 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:12 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:35 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:37 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:45 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:12:52 PX-LI-04 pmxcfs[1673]: [status] notice: received log
Jan 28 08:13:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:13:01 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:13:01 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:13:45 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:13:47 PX-LI-04 corosync[1831]: [TOTEM ] A new membership (1.46c1) was formed. Members
Jan 28 08:13:48 PX-LI-04 corosync[1831]: [QUORUM] Members[7]: 1 2 3 4 5 6 7
Jan 28 08:13:48 PX-LI-04 corosync[1831]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] link: host: 5 link: 0 is down
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Jan 28 08:13:50 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 has no active links
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] rx: host: 5 link: 0 is up
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:13:52 PX-LI-04 corosync[1831]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Jan 28 08:13:53 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:13:57 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 3442 ms
Jan 28 08:14:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [TOTEM ] A new membership (1.46c9) was formed. Members
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [QUORUM] Members[7]: 1 2 3 4 5 6 7
Jan 28 08:14:01 PX-LI-04 corosync[1831]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 28 08:14:01 PX-LI-04 pvesr[3261574]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 28 08:14:02 PX-LI-04 pvesr[3261574]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 28 08:14:03 PX-LI-04 systemd[1]: pvesr.service: Succeeded.
Jan 28 08:14:03 PX-LI-04 systemd[1]: Started Proxmox VE replication runner.
Jan 28 08:14:07 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 153 ms
Jan 28 08:14:12 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:14:19 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 155 ms
Jan 28 08:14:19 PX-LI-04 corosync[1831]: [TOTEM ] A new membership (1.46d5) was formed. Members
Jan 28 08:14:22 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 10
Jan 28 08:14:23 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 20
Jan 28 08:14:24 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 30
Jan 28 08:14:25 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 40
Jan 28 08:14:26 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 50
Jan 28 08:14:27 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 60
Jan 28 08:14:28 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 70
Jan 28 08:14:29 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 80
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:29 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:14:30 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 90
Jan 28 08:14:31 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 100
Jan 28 08:14:31 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retried 100 times
Jan 28 08:14:31 PX-LI-04 pmxcfs[1673]: [status] crit: cpg_send_message failed: 6
Jan 28 08:14:32 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:14:32 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:32 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 10
Jan 28 08:14:33 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 20
Jan 28 08:14:34 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 30
Jan 28 08:14:35 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 40
Jan 28 08:14:36 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 50
Jan 28 08:14:37 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 60
Jan 28 08:14:38 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 70
Jan 28 08:14:39 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 80
Jan 28 08:14:40 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 155 ms
Jan 28 08:14:40 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 90
Jan 28 08:14:41 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 100
Jan 28 08:14:41 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retried 100 times
Jan 28 08:14:41 PX-LI-04 pmxcfs[1673]: [status] crit: cpg_send_message failed: 6
Jan 28 08:14:41 PX-LI-04 pve-firewall[2039]: firewall update time (15.494 seconds)
Jan 28 08:14:42 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 10
Jan 28 08:14:43 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 20
Jan 28 08:14:44 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 30
Jan 28 08:14:45 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 40
Jan 28 08:14:46 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 50
Jan 28 08:14:47 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 60
Jan 28 08:14:48 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 70
Jan 28 08:14:49 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 80
Jan 28 08:14:50 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 90
Jan 28 08:14:51 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 100
Jan 28 08:14:51 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retried 100 times
Jan 28 08:14:51 PX-LI-04 pmxcfs[1673]: [status] crit: cpg_send_message failed: 6
Jan 28 08:14:52 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 10
Jan 28 08:14:53 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 20
Jan 28 08:14:54 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 30
Jan 28 08:14:54 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 154 ms
Jan 28 08:14:55 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 40
Jan 28 08:14:56 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 50
Jan 28 08:14:57 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 60
Jan 28 08:14:58 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 70
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] link: host: 3 link: 0 is down
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] link: host: 6 link: 0 is down
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 has no active links
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 has no active links
Jan 28 08:14:58 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 2176 ms
Jan 28 08:14:59 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 80
Jan 28 08:14:59 PX-LI-04 corosync[1831]: [TOTEM ] A processor failed, forming new configuration.
Jan 28 08:15:00 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 90
Jan 28 08:15:00 PX-LI-04 systemd[1]: Starting Proxmox VE replication runner...
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] rx: host: 3 link: 0 is up
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] rx: host: 6 link: 0 is up
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 28 08:15:01 PX-LI-04 corosync[1831]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 28 08:15:01 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 100
Jan 28 08:15:01 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retried 100 times
Jan 28 08:15:01 PX-LI-04 pmxcfs[1673]: [status] crit: cpg_send_message failed: 6
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:02 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 10
Jan 28 08:15:02 PX-LI-04 watchdog-mux[1266]: client watchdog expired - disable watchdog updates
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:03 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 20
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 30
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:04 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:05 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:05 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 40
Jan 28 08:15:06 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:06 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 50
Jan 28 08:15:07 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:15:07 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retry 60
Jan 28 08:15:07 PX-LI-04 corosync[1831]: [TOTEM ] Retransmit List: 1b 1c
Jan 28 08:17:55 PX-LI-04 systemd-modules-load[872]: Inserted module 'iscsi_tcp'
Jan 28 08:17:55 PX-LI-04 systemd-modules-load[872]: Inserted module 'ib_iser'
Jan 28 08:17:55 PX-LI-04 systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 28 08:17:55 PX-LI-04 lvm[877]: 1 logical volume(s) in volume group "ceph-bc540b89-09bd-4002-99f8-8006ce9ba002" monitored
Jan 28 08:17:55 PX-LI-04 lvm[877]: 1 logical volume(s) in volume group "ceph-3715e10d-554f-4536-8a7d-fe3c40da912c" monitored
Jan 28 08:17:55 PX-LI-04 systemd[1]: Started Create System Users.
Jan 28 08:17:55 PX-LI-04 kernel: [ 0.000000] Linux version 5.4.78-2-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) ()
 

Dominic

Proxmox Staff Member
Staff member
Mar 18, 2019
1,388
173
68
I am suspecting corosync to not be able to communicate properly.
Looking at your logs
Code:
Jan 28 08:13:57 PX-LI-04 corosync[1831]: [TOTEM ] Token has not been received in 3442 ms (...)
Jan 28 08:14:51 PX-LI-04 pmxcfs[1673]: [status] notice: cpg_send_message retried 100 times
I think so, too. Node reboot is expected (and inteded) behaviour if quorum is lost.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!