watchdog

  1. S

    Half of the hosts in the cluster automatically restart due to abnormality

    I especially want to know what protection mechanism the PVE cluster has to allow the host to automatically restart. Environment: There are 13 hosts in the cluster: node1-13 Version: pve-manager/6.4-4/337d6701 (running kernel: 5.4.106-1-pve) Web environment: There are two switches A and B...
  2. I

    Proxmox VE 8.1.4 - watchdog: BUG: soft lockup - CPU#X stuck for Xs

    Hello there. Since I connected my nodes to cluster I noticed on some Linux VMs I'm getting this error: I couldn't find any working solution for this. I suppose this has something to do with ZFS as one of my nodes where ZFS is not operating these VMs work without any issues. Do you have any...
  3. P

    [SOLVED] pve-ha-lrm and watchdog-mux services fail to start

    Running PVE 8.0.4 ipmi_watchdog configured After disabling maintenance mode via ha-manager crm-command node-maintenance disable node3, ha-manager status shows: lrm node3 (old timestamp - dead?, [date & time]) ... service vm:XXXX (node3, freeze) systemctl status watchdog-mux pve-ha-lrm shows...
  4. D

    Virtual Watchdog in Windows guest

    So i have enabled virtual Watchdog device in Windows (model=i6300esb, action=reset) and this device is showing in windows device manager. Does Windows Server (2016 and later) support Virtual Watchdogs? Are there extra Windows drivers i must install? Or a watchdog daemon/client like in linux...
  5. A

    Watchdog for standalone proxmox node

    I have a single node "storage server" that has started randomly crashing. While I dont have the time (and its not a business critical node) to replace it, I would like the watchdog to trap and reboot- but here I run into a dilemma. I cant install the debian watchdog package because it conflicts...
  6. C

    [TUTORIAL] Hardware watchdog at a per-VM level

    From my testing of Proxmox, one frustration I had was that unlike my previous Xen environment, Proxmox does not detect if a VM has panicked/crashed/frozen and as such won't reboot the VM, potentially ending up in hours of downtime until the issue is realised and resolved. After a bit of digging...
  7. M

    Proxmox watchdog - how to increase countdown time

    Hello I have problem with watchdog coundown time resseting. I have enabled watchdog by using: https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#Dell_IDrac_.28module_.22ipmi_watchdog.22.29 I got: WATCHDOG_MODULE=ipmi_watchdog The defaults settings are: Watchdog Timer Use...
  8. H

    Expected behavior from watchdog-mux with a networking outage? (HA, Corosync, and Softdog fencing)

    What’s the expected behavior here? I have a 3-node cluster with dedicated physical corosync network, and a 2nd faster network for storage and networking. The corosync network is configured to failover to the fast network if interrupted. High availability is configured on guests with shared...
  9. L

    Python watchdog and proxmox gui

    Hey all, i am trying to create a watchdog that will listen to proxmox firewall files and will alert me when a fw rule was modified. I am using python watchdog package. when I modify the file (directly from shell) my watchdog is notice that and alert me. When editing the same firewall file...
  10. A

    [SOLVED] Watchdog rebooted server at random moment - how to debug?

    I use Supermicro server with PVE 6.2. I did the watchdog setup this way: /etc/default/pve-ha-manager: WATCHDOG_MODULE=ipmi_watchdog /etc/modprobe.d/ipmi_watchdog.conf: options ipmi_watchdog action=power_cycle panic_wdt_timeout=10 /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="quiet...
  11. J

    erroneous VM setting caused a system fail

    Describing here a problem encountered. noteworthy is previously there were hugepages configured with the VM running and these were subsequently removed. Proxmox did not catch this imho. Booting a VM where the CPU configuration had pdpe1gb flag enabled this appeared to have resulted in initially...
  12. S

    [SOLVED] Dedicated ceph storage nodes with HA stack disabled

    Hi all, I'm testing new setup where I have 3x storage boxes(with multiples drives) and 6x compute boxes (with two drives + more RAM) all are part of same cluster. I installed ceph on 3x storage nodes and added all the free drives from it as OSDs. ceph is up and running and I can use mounted...
  13. P

    all nodes got rebooted and there is no log - Cluster disaster

    Hi folks I have added a new node to my cluster today, then I realized the new node's network configuration might have an issue that it cannot communicate with the CEPH IP ranges, I have restarted the network service using "systemctl restart networking" after this, the disaster happened and I...
  14. C

    [SOLVED] Howto setup watchdog?

    Hi, I'm running PVE cluster on 6 nodes. In total 2 different server models are used, but all are from Lenovo. In the server configuration I can define 3 types of server timeouts: OS Watchdog Loader Watchdog Enable Power Off Delay I read here that by default all hardware watchdog modules are...
  15. Z

    Proxmox Host and Clients Randomly Freeze. Have to restart the system

    Hello all, I'm having some issues with my Proxmox host. Sometimes it will randomly lockup and freeze everything. The consol will still apear but won't accept any entered keys, SSH and the GUI also don't work. I am passing through a Nivdia GT710 to a Windows 10 Client. I thought this was the...
  16. W

    i6300esb watchdog in Windows (help needed)

    I'm trying to force a windows guest reset on BSOD using the qemu watchdog functionality. After adding a virtual watchdog device /etc/pve/local/qemu-server/101.conf watchdog: model=i6300esb,action=reset to Windows VM it appears in the device manager as Intel(R) 6300ESB Watchdog timer - 25A...
  17. C

    Hardware watchdog (ipmi_watchdog) on Proxmox 5

    Dear colleagues, I moved to Proxmox 5 in a dev environment and was wondering how to setup the hardware watchdog. On the same hardware running Proxmox 4, a kernel module ipmi_watchdog has been loaded. Now I can only find the following modules. lsmod |grep ipmi ipmi_ssif 24576 0...
  18. I

    Upgraded to 5.0 headaches

    So I just upgraded to 5.0-23 tonight. When I try to boot the system, I now receive this error: [28.072000] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1] The machine will keep spitting out variations of that error (only thing that changes is the timing. Instead of 23s it...
  19. E

    [SOLVED] Watchdog fence for physical nodes

    In Proxmox 3.x I setup fencing using apc pdus. I did not have any HA VMs setup but if one of the Proxmox nodes locked up or crashed the node would fenced. Is it possible to replicate this behavior In 4.x and 5.x? I'm fine with the watchdog as the method of fencing just don't see a way to make...
  20. F

    Watchdog raises iDRAC alert

    Hi. I have Proxmox 4.4 installed in a new Dell R630 with watchdog configured as explained in the Proxmox wiki which is just as the 6th comment in this other thread suggests: https://forum.proxmox.com/threads/configure-hardware-watchdog-ipmi-fencing.32989/ At the end of that comment it says: "at...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!