HP proliant reboots

debi@n

Active Member
Nov 12, 2015
121
1
38
Málaga,Spain
Hi, we have a cluster with 4 nodes(2 HP, 2 Machines) they was uptime from 16 days, but they rebooted by proxmox watchdog(Only Machines HP). Could we disable this feauture?or any idea that how we can solved this.

Thanks and sorry for my english!
 
Hi, Thanks for reply dietmar.
Code:
on log: 
# client watchdog expired - disable watchdog updates 
And then the machine rebooted automatically
 
watchdog is now disable by default, but in proxmox 4.1 only.
Do you have done last updates ?
 
watchdog is now disable by default, but in proxmox 4.1 only.
Do you have done last updates ?
Yes
Code:
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
and
Code:
root@node1:/home/# service watchdog-mux status
● watchdog-mux.service - Proxmox VE watchdog multiplexer
  Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static)
  Active: active (running) since Thu 2016-01-07 12:56:02 CET; 18h ago
 Main PID: 1044 (watchdog-mux)
  CGroup: /system.slice/watchdog-mux.service
  └─1044 /usr/sbin/watchdog-mux

Jan 07 12:56:02 node1 watchdog-mux[1044]: Watchdog driver 'Software Watchdo...0
Hint: Some lines were ellipsized, use -l to show in full.
 
you can try to disable watchdog if you don't use HA. (systemctl disable watchdog-mux).
then reboot.
(not sure about the impact, because other HA service like pve-ha-crm.service,pve-ha-lrm.service also depend on it, so maybe you need to disable them too.

before doing that, I just wonder if you still have reboots without smh ?"
 
Another possibility could be to test "hpwdt" watchdog module

/etc/default/pve-ha-manager
WATCHDOG_MODULE=hpwdt


also, please disable nmi watchdog

edit: /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

the

# update-grub

and reboot
 
you can try to disable watchdog if you don't use HA. (systemctl disable watchdog-mux).
then reboot.
(not sure about the impact, because other HA service like pve-ha-crm.service,pve-ha-lrm.service also depend on it, so maybe you need to disable them too.

before doing that, I just wonder if you still have reboots without smh ?"
im working with HA, it´s the problem :(
 
Another possibility could be to test "hpwdt" watchdog module

/etc/default/pve-ha-manager
WATCHDOG_MODULE=hpwdt


also, please disable nmi watchdog

edit: /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

the

# update-grub

and reboot
on proxmox 4.0 we had kernel panic with hpwdt module. =S
and if it's an intel cpu, you can try the "iTCO_wdt" watchdog module

/etc/default/pve-ha-manager
WATCHDOG_MODULE=iTCO_wdt
ok we will test this,
mmm this is the main problem , on our problem with reboots,
Ok Thank you for your interested Spirit, we will check this, Thanks! :D
 
nmi watchdog was enabled with hpwdt module
Thanks
nmi watchdog should be disable if hardware watchdog is used
https://www.kernel.org/doc/Documentation/watchdog/hpwdt.txt

also, 1 problem problem with proxmox 4.0, was that multiple watchdog could be loaded and conflct.
That's resolved with proxmox 4.1, where all modules are blacklisted by default, and need to be specified manually in
/etc/default/pve-ha-manager.

So, it could be great to test hpwdt first with proxmox 4.1

(Sorry, I don't have HP server to test, so you are beta tester ;)
 
nmi watchdog should be disable if hardware watchdog is used
https://www.kernel.org/doc/Documentation/watchdog/hpwdt.txt

also, 1 problem problem with proxmox 4.0, was that multiple watchdog could be loaded and conflct.
That's resolved with proxmox 4.1, where all modules are blacklisted by default, and need to be specified manually in
/etc/default/pve-ha-manager.

So, it could be great to test hpwdt first with proxmox 4.1

(Sorry, I don't have HP server to test, so you are beta tester ;)
nmi watchdog should be disable if hardware watchdog is used
https://www.kernel.org/doc/Documentation/watchdog/hpwdt.txt

also, 1 problem problem with proxmox 4.0, was that multiple watchdog could be loaded and conflct.
That's resolved with proxmox 4.1, where all modules are blacklisted by default, and need to be specified manually in
/etc/default/pve-ha-manager.

So, it could be great to test hpwdt first with proxmox 4.1

(Sorry, I don't have HP server to test, so you are beta tester ;)
i had bad experiences with HA and HP, the truth.
hpwdt is on blacklist.conf and im working with softdog on all machines, but only Hp has reboots :S(every 16 days more or less). i will test if it happens again or not.
Thanks for the help! :D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!