I'm having a cluster issue after adding a new node to the cluster. One of the existing nodes (that was powered off at the time of join) is now causing issues when it's powered on. pve-cluster and corosync both seem happy on all nodes but the behavior goes completely erratic as soon as prod-prod01 is powered on when it behaved completely normal yesterday. We are testing ZFS vs hardware RAID so this particular node was completely reimaged.
When I started digging into it, the corosync files between the healthy nodes and the problem child are mis-matched and the log files indicate that prox-prod01 thinks it's on an island. Dare I try and edit the prod-prod01 corosync manually?
When I try to connect via GUI, I get an unfamiliar error:
Corosync.conf from a healthy node:
Unhealthy node (note two missing nodes):
When I started digging into it, the corosync files between the healthy nodes and the problem child are mis-matched and the log files indicate that prox-prod01 thinks it's on an island. Dare I try and edit the prod-prod01 corosync manually?
When I try to connect via GUI, I get an unfamiliar error:
Code:
Error hostname lookup 'prox-prod01' failed - failed to get address info for: prox-prod01: Name or service not known (500)
Corosync.conf from a healthy node:
Code:
cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: prox-lab01
nodeid: 3
quorum_votes: 1
ring0_addr: 10.2.0.20
}
node {
name: prox-prod01
nodeid: 1
quorum_votes: 1
ring0_addr: 10.2.0.10
}
node {
name: prox-prod02
nodeid: 6
quorum_votes: 1
ring0_addr: 10.2.0.11
}
node {
name: prox-prod03
nodeid: 2
quorum_votes: 1
ring0_addr: 10.2.0.12
}
node {
name: prox-raspi01
nodeid: 4
quorum_votes: 1
ring0_addr: 10.2.0.30
}
node {
name: prox-raspi02
nodeid: 5
quorum_votes: 1
ring0_addr: 10.2.0.31
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: Castle
config_version: 7
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
Unhealthy node (note two missing nodes):
Code:
cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: prox-lab01
nodeid: 3
quorum_votes: 1
ring0_addr: 10.2.0.20
}
node {
name: prox-prod01
nodeid: 1
quorum_votes: 1
ring0_addr: 10.2.0.10
}
node {
name: prox-prod03
nodeid: 2
quorum_votes: 1
ring0_addr: 10.2.0.12
}
node {
name: prox-raspi01
nodeid: 4
quorum_votes: 1
ring0_addr: 10.2.0.30
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: Castle
config_version: 5
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}