when we reboot one node, al cluster is rebooted

iruindegi

Renowned Member
Aug 26, 2016
54
0
71
Zarautz
Hi,

We have (for us) a big problem. Since not too much time, when we reboot a single node of our cluster, all nodes are rebooted! why is that happening?? how to fix it??

This is our setup:
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-28-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-17
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-28-pve: 4.15.18-56
pve-kernel-4.15.18-26-pve: 4.15.18-54
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.13-pve1~bpo9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

=> pvecm status

Code:
root@pve1:~#  pvecm status
Quorum information
------------------
Date:             Mon Jun  1 08:47:01 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/1844
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.94.0.41 (local)
0x00000002          1 10.94.0.42
0x00000004          1 10.94.0.43
root@pve1:~#
 
Last edited:
Did you at some point have more than 3 hosts? Because your quorum information says "Expected votes: 4" and all your hosts have 1 vote each, so I would think you had 4 hosts at some point, or maybe that you had assigned a single host 2 votes at some point and not reduced the expected number of votes required for quorum.
 
Did you at some point have more than 3 hosts? Because your quorum information says "Expected votes: 4" and all your hosts have 1 vote each, so I would think you had 4 hosts at some point, or maybe that you had assigned a single host 2 votes at some point and not reduced the expected number of votes required for quorum.
Yes, we had 4 nodes. I just removed the unused node from the cluster.... but is it the reason why all cluster is rebooted when we reboot one single node?
 
For whatever reason (maybe you didn't remove the node the correct way) the cluster still thinks there should be 4 nodes voting, and when one reboots, leaving only 2/4 left, they got no quorum and cannot make decisions any longer. They will try to solve this but may eventually give up and reboot in order to try to solve certain issues. I don't know the exact circumstances when a reboot will happen, but I have experienced it when having network problems and hadn't setup multiple corosync rings yet.

You should adjust your cluster to expect 3 votes, then things will work as expected again. (edit) Of course there may be other problems.

See https://pve.proxmox.com/wiki/Cluster_Manager
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!