Hello,
Our 3-nodes Proxmox cluster suddenly failed, with all nodes being disconnected from each other. All VMs are still working, but the UI/cluster is showing everything as disconnected.
This happened with no OS upgrades, no network changes (at least did by me). Nobody touched anything.
The ISP is OVH.
I've ran a (short) omping, here are the results:
I can fully ping and resolve the hostnames:
Here's the corosyinc config:
Here's /etc/hosts:
I've upgraded one of the nodes to latest and rebooted but still no luck (worse, it's now "waiting for quorum").
Any hints?
Our 3-nodes Proxmox cluster suddenly failed, with all nodes being disconnected from each other. All VMs are still working, but the UI/cluster is showing everything as disconnected.
This happened with no OS upgrades, no network changes (at least did by me). Nobody touched anything.
The ISP is OVH.
I've ran a (short) omping, here are the results:
Code:
192.168.1.2 : unicast, xmt/rcv/%loss = 35/35/0%, min/avg/max/std-dev = 0.068/0.151/0.236/0.037
192.168.1.2 : multicast, xmt/rcv/%loss = 35/35/0%, min/avg/max/std-dev = 0.074/0.207/0.319/0.058
192.168.1.3 : unicast, xmt/rcv/%loss = 35/35/0%, min/avg/max/std-dev = 0.100/0.128/0.228/0.034
192.168.1.3 : multicast, xmt/rcv/%loss = 35/35/0%, min/avg/max/std-dev = 0.108/0.188/0.282/0.046
I can fully ping and resolve the hostnames:
Code:
root@pmx1-lim:~# ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.068 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.067 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.071 ms
root@pmx1-lim:~# ping pmx2-lim
PING pmx2-lim (192.168.1.2) 56(84) bytes of data.
64 bytes from pmx2-lim (192.168.1.2): icmp_seq=1 ttl=64 time=0.087 ms
64 bytes from pmx2-lim (192.168.1.2): icmp_seq=2 ttl=64 time=0.080 ms
64 bytes from pmx2-lim (192.168.1.2): icmp_seq=3 ttl=64 time=0.112 ms
64 bytes from pmx2-lim (192.168.1.2): icmp_seq=4 ttl=64 time=0.112 ms
Here's the corosyinc config:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pmx1-lim
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.1
}
node {
name: pmx2-lim
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.2
}
node {
name: pmx3-lim
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.3
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: proxmox
config_version: 3
interface {
bindnetaddr: 192.168.1.1
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
Here's /etc/hosts:
Code:
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 .localdomain
PUBLIC-IP pmx1-3114549
192.168.1.1 pmx1-lim
192.168.1.2 pmx2-lim
192.168.1.3 pmx3-lim
# The following lines are desirable for IPv6 capable hosts
#(added automatically by netbase upgrade)
::1 ip6-localhost ip6-loopback
feo0::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
I've upgraded one of the nodes to latest and rebooted but still no luck (worse, it's now "waiting for quorum").
Any hints?
Last edited: