Whole cluster down after one node reboot

encore · May 16, 2018

Hi,

I have a small PVE Cluster with 3 nodes. On every node are ~ 200 lxc CTs running.
When I reboot one of these node for any reason, all nodes in the cluster get a grey ? or a red X next to their names.

In syslog I see something like:

Code:

May 16 11:39:39 captive001-72001-bl03 corosync[1802]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
May 16 11:39:40 captive001-72001-bl03 corosync[1802]: warning [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

When the rebooted node comes back online, my nodes do not become green again but stay in that ? or X mode. Once I rebooted ALL nodes, they get the green status again.

Why is that happening?

t.lamprecht · May 16, 2018

encore said:
When the rebooted node comes back online, my nodes do not become green again but stay in that ? or X mode. Once I rebooted ALL nodes, they get the green status again.

Why is that happening?

Can you please describe your network setup and post your /etc/pve/corosync.conf ?

André Kaminski · May 25, 2018

Same problem here!
Fresh installed Proxmox setup. I have only created a cluster and added the node as described in the official wiki.

When i reboot one node, a few moment later the other will follow him.

My setup contains two Proxmox-VE 5.2 Nodes.

root@ed1spu0002:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: ed1spu0001
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.67.43
}
node {
name: ed1spu0002
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.67.41
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: ed1cpu0001
config_version: 2
interface {
bindnetaddr: 192.168.67.43
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

Search

Search

Whole cluster down after one node reboot

encore

Well-Known Member

t.lamprecht

Proxmox Staff Member

André Kaminski

New Member

We value your privacy