watchdog

  1. K

    watchdog detected hard lockup on cpu XX

    Hi, Can anyone help with 1 of the 4 nodes in an HPE Apollo r2600 Gen10 System won't boot the PVE ISO with the error: watchdog detected hard lockup on CPU XX Where XX is a number that changes each time I try. All nodes are the same hardware and BIOS version (latest). I can install other...
  2. L

    VE Cluster with 5 servers - issue

    VE Cluster with 5 servers - issue Hi, we have a ve clster with 5 servers, all server are: Supermicro Server CSE-819U 2x 14-Core Xeon E5-2690 v4 2,6GHz 128GB 9361-8i prox1 to prox5 have the same netconfig 192.168.1.150-154 (adminnet) There are running 2-3 vm's on eatch server with local zfs...
  3. C

    Watchdog für VMs/LXCs und Proxmox selbst

    Hallo Leute, bin ein ziemlicher Neuling, was Proxmox angeht. Ich habe mir zwischenzeitlich einen Proxmox Server aufgesetzt und habe darin mehrere VMs und LXCs am Laufen. Nun würde ich gerne einen Watchdog einrichten, der die einzelnen VMs und LXCs überwacht und gegebenenfalls neu startet. Ich...
  4. E

    [TUTORIAL] [High Availability] Watchdog reboots

    First of all, you can recognise watchdog induced reboots of your node from the end of last boot's log containing entries such as: watchdog-mux: Client watchdog expired - disable watchdog updates kernel: watchdog: watchdog0: watchdog did not stop! You should probably start with reading the...
  5. L

    e1000e if.6 eno1: NETDEV WATCHDOG: CPU: transit queue timed out

    Hello, I'm running in yet another problem! I have a Windows 11 VM on proxmox, this is pure for downloading. Everytime i'm downloading and the download goes around 30 mbps I get the following messeages: After this I lose RDP connection for a few seconds and Unifi tells me the device is...
  6. A

    Frequent Watchdog reboots

    I am relatively new to Proxmox and have a cluster running with 3 nodes, everything is currently working fine, cluster is up HA is running fine. The issue I face currently is that in case the cluster link goes down for let's say more than 10s, watchdog kicks in and reboots the server, this causes...
  7. A

    [SOLVED] Apt repository dependency problem (watchdog)

    I got some random kernel hangs lately when testing software in LXC containers, this made me want to enable the built-in Intel watchdog in the EFI. For this I need a software keeper which is provided by the watchdog package. But when I try to install the package from the default Debian Bookworm...
  8. N

    Error watchdog

    Hi everyone, I'm new, I bought a used server and I decided to take the path with proxmox, with the installation everything went well, but then when I restart the server, often but not always I get this watchdog error, to solve it I have to go into the bios and do a save and reset, but I'd...
  9. tweans

    Nach Upgrade auf neueste Version "watchdog: BUG: soft lockup - CPU#X"

    Hallo zusammen, ich habe mein Proxmox heute auf die neueste Version upgedatet. Nun habe ich massive Probleme. Die Kiste rebootet zwar, die VMs sind auch teilweise (kurz) erreichbar, aber dann geht nichts mehr. Logge ich mich über SSH ein bekomme ich massenhaft folgende Meldungen: Message...
  10. S

    Half of the hosts in the cluster automatically restart due to abnormality

    I especially want to know what protection mechanism the PVE cluster has to allow the host to automatically restart. Environment: There are 13 hosts in the cluster: node1-13 Version: pve-manager/6.4-4/337d6701 (running kernel: 5.4.106-1-pve) Web environment: There are two switches A and B...
  11. I

    Proxmox VE 8.1.4 - watchdog: BUG: soft lockup - CPU#X stuck for Xs

    Hello there. Since I connected my nodes to cluster I noticed on some Linux VMs I'm getting this error: I couldn't find any working solution for this. I suppose this has something to do with ZFS as one of my nodes where ZFS is not operating these VMs work without any issues. Do you have any...
  12. P

    [SOLVED] pve-ha-lrm and watchdog-mux services fail to start

    Running PVE 8.0.4 ipmi_watchdog configured After disabling maintenance mode via ha-manager crm-command node-maintenance disable node3, ha-manager status shows: lrm node3 (old timestamp - dead?, [date & time]) ... service vm:XXXX (node3, freeze) systemctl status watchdog-mux pve-ha-lrm shows...
  13. D

    Virtual Watchdog in Windows guest

    So i have enabled virtual Watchdog device in Windows (model=i6300esb, action=reset) and this device is showing in windows device manager. Does Windows Server (2016 and later) support Virtual Watchdogs? Are there extra Windows drivers i must install? Or a watchdog daemon/client like in linux...
  14. A

    Watchdog for standalone proxmox node

    I have a single node "storage server" that has started randomly crashing. While I dont have the time (and its not a business critical node) to replace it, I would like the watchdog to trap and reboot- but here I run into a dilemma. I cant install the debian watchdog package because it conflicts...
  15. C

    [TUTORIAL] Hardware watchdog at a per-VM level

    From my testing of Proxmox, one frustration I had was that unlike my previous Xen environment, Proxmox does not detect if a VM has panicked/crashed/frozen and as such won't reboot the VM, potentially ending up in hours of downtime until the issue is realised and resolved. After a bit of digging...
  16. M

    Proxmox watchdog - how to increase countdown time

    Hello I have problem with watchdog coundown time resseting. I have enabled watchdog by using: https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#Dell_IDrac_.28module_.22ipmi_watchdog.22.29 I got: WATCHDOG_MODULE=ipmi_watchdog The defaults settings are: Watchdog Timer Use...
  17. H

    Expected behavior from watchdog-mux with a networking outage? (HA, Corosync, and Softdog fencing)

    What’s the expected behavior here? I have a 3-node cluster with dedicated physical corosync network, and a 2nd faster network for storage and networking. The corosync network is configured to failover to the fast network if interrupted. High availability is configured on guests with shared...
  18. L

    Python watchdog and proxmox gui

    Hey all, i am trying to create a watchdog that will listen to proxmox firewall files and will alert me when a fw rule was modified. I am using python watchdog package. when I modify the file (directly from shell) my watchdog is notice that and alert me. When editing the same firewall file...
  19. A

    [SOLVED] Watchdog rebooted server at random moment - how to debug?

    I use Supermicro server with PVE 6.2. I did the watchdog setup this way: /etc/default/pve-ha-manager: WATCHDOG_MODULE=ipmi_watchdog /etc/modprobe.d/ipmi_watchdog.conf: options ipmi_watchdog action=power_cycle panic_wdt_timeout=10 /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="quiet...
  20. J

    erroneous VM setting caused a system fail

    Describing here a problem encountered. noteworthy is previously there were hugepages configured with the VM running and these were subsequently removed. Proxmox did not catch this imho. Booting a VM where the CPU configuration had pdpe1gb flag enabled this appeared to have resulted in initially...