cluster crashed / cpg_send_message retried 100 times one node is red

TheMrg

Well-Known Member
Aug 1, 2019
122
4
58
43
we lost one server of 6 nodes cluster. after reboot the node:

root@cluster24:~# pvecm status
Cluster information
-------------------
Name: cluster
Config Version: 29
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Tue Nov 23 01:00:48 2021
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 1.26317
Quorate: No

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 1
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.1.24 (local)


Nov 23 01:00:48 cluster24 pvesr[10680]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 23 01:00:48 cluster24 pve-firewall[2868]: firewall update time (10.040 seconds)
Nov 23 01:00:49 cluster24 pvesr[10680]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 23 01:00:49 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 10
Nov 23 01:00:50 cluster24 pvesr[10680]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 23 01:00:50 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 20
Nov 23 01:00:51 cluster24 pvesr[10680]: trying to acquire cfs lock 'file-replication_cfg' .
......
Nov 23 01:02:30 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 20
Nov 23 01:02:31 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 30
Nov 23 01:02:32 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 40
Nov 23 01:02:33 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 50
Nov 23 01:02:34 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 60
Nov 23 01:02:35 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 70
Nov 23 01:02:36 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 80
Nov 23 01:02:37 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 90
Nov 23 01:02:38 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 100
Nov 23 01:02:38 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retried 100 times
Nov 23 01:02:38 cluster24 pmxcfs[2576]: [status] crit: cpg_send_message failed: 6
Nov 23 01:02:39 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 10
Nov 23 01:02:40 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 20
Nov 23 01:02:41 cluster24 pmxcfs[2576]: [status] notice: cpg_send_message retry 30
################################################################################################
FROM one of the other clusternodes:

root@cluster26:~# pvecm status
Cluster information
-------------------
Name: cluster
Config Version: 29
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Tue Nov 23 02:03:38 2021
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000001
Ring ID: 1.26316
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 5
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.26 (local)
0x00000003 1 192.168.1.20
0x00000007 1 192.168.1.22
0x00000008 1 192.168.1.23
0x0000000a 1 192.168.1.25

Nov 23 01:57:47 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retried 100 times
Nov 23 01:57:47 cluster26 pmxcfs[10532]: [status] crit: cpg_send_message failed: 6
Nov 23 01:57:47 cluster26 pve-ha-lrm[2048]: unable to write lrm status file - unable to open file '/etc/pve/nodes/cluster26/lrm_s
Nov 23 01:57:47 cluster26 pvestatd[1987]: status update time (230.376 seconds)
Nov 23 01:57:47 cluster26 pve-firewall[1988]: firewall update time (10.011 seconds)
Nov 23 01:57:48 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 10
Nov 23 01:57:48 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 10
Nov 23 01:57:49 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 20
Nov 23 01:57:49 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 20
Nov 23 01:57:50 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 30
Nov 23 01:57:50 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 30
Nov 23 01:57:51 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 40
Nov 23 01:57:51 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 40
Nov 23 01:57:52 cluster26 pve-ha-lrm[2048]: loop take too long (40 seconds)
Nov 23 01:57:52 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 50
Nov 23 01:57:52 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 50
Nov 23 01:57:53 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 60
Nov 23 01:57:53 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 60
Nov 23 01:57:54 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 70
Nov 23 01:57:54 cluster26 pmxcfs[10532]: [status] notice: cpg_send_message retry 70
Nov 23 01:57:55 cluster26 pmxcfs[10532]: [dcdb] notice: cpg_send_message retry 8
 
Nobody? We set this cluster to standalone to have the VMs up. But seems this node is corrupt and we can not get it into the cluster again.
 
which versions are you running? pveversion -v from all nodes
 
cluster24:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-2
pve-kernel-helper: 6.3-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-1
libpve-common-perl: 6.3-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1


cluster26:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
please update to the latest 6.4 version..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!