PVE 4.x - node reboot induced cluster reboot

thz

Member
Jul 31, 2015
26
2
23
Terra
Hello,

after reboot of one node suddenly reboots all nodes in the cluster. I do the reboot in a ssh session and with the reboot button in the web interface and get the same issue.

The system is a debian jessie:

Code:
pveversion  --verbose
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: 4.0.20-1

Why do all nodes reboot if I reboot only one node?
 
I have found this page: http://www.smilecouple.org/2014/09/24/proxmox-ceph-and-linux-helps/

Code:
High Availability, HA: reset rgmanager
Get rgmanager to start, check status and rejoin fence group

fence_tool join
fence_tool ls
after an HA event, you need to re-enable the rgmanager to allow management of the VMs from one computer to another.
/etc/init.d/rgmanager start

If you are going to reboot the proxmox server for kernel updates, first stop the rgmanager to prevent a fencing even and power cut off from the APC PDU

/etc/init.d/rgmanager stop

But this is for pve lower than 4.x. Could it helps?
 
I've disabled watchdog on all nodes:
Code:
> ipmitool mc watchdog off
> ipmitool mc watchdog get
Watchdog Timer Use:  SMS/OS (0x04)
Watchdog Timer Is:  Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval:  0 seconds
Timer Expiration Flags: 0x00
Initial Countdown:  300 sec
Present Countdown:  300 sec
and reboot node 2. Node 1 and node 3 don't reboot. I think that is a possible solution. :)
 
Last edited:
  • Like
Reactions: chrone
same problem here, thank you for the tips. disabling ipmi seems to solve (HP DL560)
 
  • Like
Reactions: chrone

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!