when we reboot one node, al cluster is rebooted

iruindegi · Jun 1, 2020

Hi,

We have (for us) a big problem. Since not too much time, when we reboot a single node of our cluster, all nodes are rebooted! why is that happening?? how to fix it??

This is our setup:

Code:

proxmox-ve: 5.4-2 (running kernel: 4.15.18-28-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-17
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-28-pve: 4.15.18-56
pve-kernel-4.15.18-26-pve: 4.15.18-54
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.13-pve1~bpo9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

=> pvecm status

Code:

root@pve1:~#  pvecm status
Quorum information
------------------
Date:             Mon Jun  1 08:47:01 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/1844
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.94.0.41 (local)
0x00000002          1 10.94.0.42
0x00000004          1 10.94.0.43
root@pve1:~#

Bengt Nolin · Jun 1, 2020

Did you at some point have more than 3 hosts? Because your quorum information says "Expected votes: 4" and all your hosts have 1 vote each, so I would think you had 4 hosts at some point, or maybe that you had assigned a single host 2 votes at some point and not reduced the expected number of votes required for quorum.

iruindegi · Jun 1, 2020

Bengt Nolin said:
Did you at some point have more than 3 hosts? Because your quorum information says "Expected votes: 4" and all your hosts have 1 vote each, so I would think you had 4 hosts at some point, or maybe that you had assigned a single host 2 votes at some point and not reduced the expected number of votes required for quorum.

Yes, we had 4 nodes. I just removed the unused node from the cluster.... but is it the reason why all cluster is rebooted when we reboot one single node?

Bengt Nolin · Jun 1, 2020

For whatever reason (maybe you didn't remove the node the correct way) the cluster still thinks there should be 4 nodes voting, and when one reboots, leaving only 2/4 left, they got no quorum and cannot make decisions any longer. They will try to solve this but may eventually give up and reboot in order to try to solve certain issues. I don't know the exact circumstances when a reboot will happen, but I have experienced it when having network problems and hadn't setup multiple corosync rings yet.

You should adjust your cluster to expect 3 votes, then things will work as expected again. (edit) Of course there may be other problems.

See https://pve.proxmox.com/wiki/Cluster_Manager

Search

Search

when we reboot one node, al cluster is rebooted

iruindegi

Renowned Member

Bengt Nolin

Well-Known Member

iruindegi

Renowned Member

Bengt Nolin

Well-Known Member

We value your privacy