PVE 4.x - node reboot induced cluster reboot

thz

Member
Jul 31, 2015
26
2
23
Terra
Hello,

after reboot of one node suddenly reboots all nodes in the cluster. I do the reboot in a ssh session and with the reboot button in the web interface and get the same issue.

The system is a debian jessie:

Code:
pveversion  --verbose
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: 4.0.20-1

Why do all nodes reboot if I reboot only one node?
 
I have found this page: http://www.smilecouple.org/2014/09/24/proxmox-ceph-and-linux-helps/

Code:
High Availability, HA: reset rgmanager
Get rgmanager to start, check status and rejoin fence group

fence_tool join
fence_tool ls
after an HA event, you need to re-enable the rgmanager to allow management of the VMs from one computer to another.
/etc/init.d/rgmanager start

If you are going to reboot the proxmox server for kernel updates, first stop the rgmanager to prevent a fencing even and power cut off from the APC PDU

/etc/init.d/rgmanager stop

But this is for pve lower than 4.x. Could it helps?
 
I've disabled watchdog on all nodes:
Code:
> ipmitool mc watchdog off
> ipmitool mc watchdog get
Watchdog Timer Use:  SMS/OS (0x04)
Watchdog Timer Is:  Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval:  0 seconds
Timer Expiration Flags: 0x00
Initial Countdown:  300 sec
Present Countdown:  300 sec
and reboot node 2. Node 1 and node 3 don't reboot. I think that is a possible solution. :)
 
Last edited:
  • Like
Reactions: chrone