Hello, after unknown problems in our system network our proxmox cluster lost quorum and not working.
I've checked several threads but don't find solution.
We have 9 nodes in cluster and all of them show different results in pvecm - most slow only thmeselves:
Some shows other nodes but not each other:
On one node I've stopped pve-cluster but now it cannot start:
I've totally lost and need your help.
My only idea is to turn off all nodes and start them one by one, because they see each other - checked by ping:
I've checked several threads but don't find solution.
We have 9 nodes in cluster and all of them show different results in pvecm - most slow only thmeselves:
Code:
root@vu203adm:~# pvecm status
Cluster information
-------------------
Name: lightcluster
Config Version: 19
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Apr 2 11:15:02 2022
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000004
Ring ID: 4.81c8
Quorate: No
Votequorum information
----------------------
Expected votes: 9
Highest expected: 9
Total votes: 1
Quorum: 5 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000004 1 10.100.141.203 (local)
root@vu204adm:~# pvecm status
Cluster information
-------------------
Name: lightcluster
Config Version: 19
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Apr 2 11:45:01 2022
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000007
Ring ID: 7.823c
Quorate: No
Votequorum information
----------------------
Expected votes: 9
Highest expected: 9
Total votes: 1
Quorum: 5 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000007 1 10.100.141.204 (local)
Some shows other nodes but not each other:
Code:
root@vu175adm:~# pvecm status
Cluster information
-------------------
Name: lightcluster
Config Version: 19
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Apr 2 11:34:07 2022
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000006
Ring ID: 1.794c
Quorate: No
Votequorum information
----------------------
Expected votes: 9
Highest expected: 9
Total votes: 1
Quorum: 5 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.100.140.176
0x00000004 1 10.100.141.203
0x00000005 1 10.100.140.174
0x00000006 1 10.100.140.175 (local)
On one node I've stopped pve-cluster but now it cannot start:
Code:
root@vu205adm:~# pvecm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
root@vu205adm:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: activating (start) since Sat 2022-04-02 11:42:34 MSK; 56s ago
Cntrl PID: 92740 (pmxcfs)
Tasks: 3 (limit: 17203)
Memory: 3.2M
CGroup: /system.slice/pve-cluster.service
├─92740 /usr/bin/pmxcfs
└─92745 /usr/bin/pmxcfs
апр 02 11:43:22 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 470
апр 02 11:43:23 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 480
апр 02 11:43:24 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 490
апр 02 11:43:25 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 500
апр 02 11:43:26 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 510
апр 02 11:43:27 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 520
апр 02 11:43:28 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 530
апр 02 11:43:29 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 540
апр 02 11:43:30 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 550
апр 02 11:43:31 vu205adm pmxcfs[92745]: [dcdb] notice: cpg_join retry 560
I've totally lost and need your help.
My only idea is to turn off all nodes and start them one by one, because they see each other - checked by ping:
Code:
root@vu204adm:~# ping 10.100.141.203
PING 10.100.141.203 (10.100.141.203) 56(84) bytes of data.
64 bytes from 10.100.141.203: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 10.100.141.203: icmp_seq=2 ttl=64 time=0.071 ms
64 bytes from 10.100.141.203: icmp_seq=3 ttl=64 time=0.066 ms
64 bytes from 10.100.141.203: icmp_seq=4 ttl=64 time=0.108 ms
^C
--- 10.100.141.203 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 66ms
rtt min/avg/max/mdev = 0.052/0.074/0.108/0.021 ms