We have a cluster with 3 servers on Proxmox 6: 2 Dell servers and and a small whitebox PC for quorum and backups. On saturday the cluster stopped working for apparently no reason until we rebooted both dell servers and set "pvecm expected 1" on on of them. I tried to dig through the logs but I really couldn't find the reason. Quorum sometimes seemed to be made between 2 or even 3 of the servers but would constantly break again. We have had no problems before or since then, but would like to prevent this from happening again.
Also, there is HA enabled for a single VM and replication set for 5 or 6 VM between the dell servers.
Here is my corosync conf:
And I have attached the logs from the 3 servers. The problem started at 1AM and was fixed around 11AM when someone rebooted both dell servers and set the expected 1 command.
EDIT: Had to remove a few thousands lines from pve00 logs since it was too large, so I cut a chunk from the middle.
Also, there is HA enabled for a single VM and replication set for 5 or 6 VM between the dell servers.
Here is my corosync conf:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve00
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.200.220
ring1_addr: 10.1.200.220
}
node {
name: pve01
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.200.221
ring1_addr: 10.1.200.221
}
node {
name: pve02
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.200.222
ring1_addr: 10.1.200.222
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: proxmox-cluster
config_version: 3
interface {
linknumber: 0
}
interface {
linknumber: 1
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
And I have attached the logs from the 3 servers. The problem started at 1AM and was fixed around 11AM when someone rebooted both dell servers and set the expected 1 command.
EDIT: Had to remove a few thousands lines from pve00 logs since it was too large, so I cut a chunk from the middle.
Attachments
Last edited: