Hi.
Today we have a strange behavior in our cluster.
One node is getting troubles with their network (as we seen in the logfiles - nobody knows why) and the hole cluster crashed.
We cant understand why this happened. We see in the logfiles that the cluster has a quorum of 13 nodes (there are 14 in total) and node 09 runs into a problem. Ok we are fine with this. But why do all nodes a reboot when a quorum consists of 13 nodes?
I have some config and log files.
I try to post it below.
If someone has an idea what caused the error please let me know.
PVE version:
PVE Cluster status:
PVE nodes:
Corosync config:
Today we have a strange behavior in our cluster.
One node is getting troubles with their network (as we seen in the logfiles - nobody knows why) and the hole cluster crashed.
We cant understand why this happened. We see in the logfiles that the cluster has a quorum of 13 nodes (there are 14 in total) and node 09 runs into a problem. Ok we are fine with this. But why do all nodes a reboot when a quorum consists of 13 nodes?
I have some config and log files.
I try to post it below.
If someone has an idea what caused the error please let me know.
PVE version:
Code:
root@host14:~# pveversion
pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)
PVE Cluster status:
Code:
root@host14:~# pvecm status
Cluster information
-------------------
Name: Cluster-PVE
Config Version: 27
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Mon Oct 24 17:30:32 2022
Quorum provider: corosync_votequorum
hosts: 14
host ID: 0x0000000e
Ring ID: 1.256
Quorate: Yes
Votequorum information
----------------------
Expected votes: 14
Highest expected: 14
Total votes: 14
Quorum: 8
Flags: Quorate
Membership information
----------------------
hostid Votes Name
0x00000001 1 172.28.30.20
0x00000002 1 172.28.30.21
0x00000003 1 172.28.30.22
0x00000004 1 172.28.30.23
0x00000005 1 172.28.30.24
0x00000006 1 172.28.30.25
0x00000007 1 172.28.30.26
0x00000008 1 172.28.30.27
0x00000009 1 172.28.30.28
0x0000000a 1 172.28.30.29
0x0000000b 1 172.28.30.30
0x0000000c 1 172.28.30.31
0x0000000d 1 172.28.30.32
0x0000000e 1 172.28.30.33 (local)
PVE nodes:
Code:
root@host14:~# pvecm hosts
Membership information
----------------------
hostid Votes Name
1 1 host01
2 1 host02
3 1 host03
4 1 host04
5 1 host05
6 1 host06
7 1 host07
8 1 host08
9 1 host09
10 1 host10
11 1 host11
12 1 host12
13 1 host13
14 1 host14 (local)
Corosync config:
Code:
root@host14:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}
hostlist {
host {
name: host04
hostid: 4
quorum_votes: 1
ring0_addr: host04-ha
}
host {
name: host10
hostid: 10
quorum_votes: 1
ring0_addr: host10-ha
}
host {
name: host09
hostid: 9
quorum_votes: 1
ring0_addr: host09-ha
}
host {
name: host08
hostid: 8
quorum_votes: 1
ring0_addr: host08-ha
}
host {
name: host07
hostid: 7
quorum_votes: 1
ring0_addr: host07-ha
}
host {
name: host11
hostid: 11
quorum_votes: 1
ring0_addr: host11-ha
}
host {
name: host13
hostid: 13
quorum_votes: 1
ring0_addr: host13-ha
}
host {
name: host03
hostid: 3
quorum_votes: 1
ring0_addr: host03-ha
}
host {
name: host02
hostid: 2
quorum_votes: 1
ring0_addr: host02-ha
}
host {
name: host12
hostid: 12
quorum_votes: 1
ring0_addr: host12-ha
}
host {
name: host14
hostid: 14
quorum_votes: 1
ring0_addr: host14-ha
}
host {
name: host06
hostid: 6
quorum_votes: 1
ring0_addr: host06-ha
}
host {
name: host05
hostid: 5
quorum_votes: 1
ring0_addr: host05-ha
}
host {
name: host01
hostid: 1
quorum_votes: 1
ring0_addr: host01-ha
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: Cluster-PVE
config_version: 27
interface {
bindnetaddr: 172.28.30.20
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
Attachments
-
host01_log.txt2.9 KB · Views: 8
-
host02_log.txt2.2 KB · Views: 5
-
host03_log.txt3.3 KB · Views: 2
-
host04_log.txt5.2 KB · Views: 2
-
host05_log.txt1.6 KB · Views: 2
-
host06_log.txt2.5 KB · Views: 2
-
host07_log.txt2.5 KB · Views: 4
-
host08_log.txt2 KB · Views: 3
-
host09_log.txt11.4 KB · Views: 3
-
host10_log.txt4.3 KB · Views: 2