PVE 4 with HA

There is an easy way to test if the watchdog works correctly:

# echo 1 >/dev/watchdog

This should trigger a reboot within 60 seconds. Does that work?
 
I just tried your simple test:

root@pve:~# echo 1 >/dev/watchdog
-bash: /dev/watchdog: Device or resource busy

Is that the correct response?
 
Ah, yes - seems watchdog-mux is still running. Try:

# systemctl stop watchdog-mux.service
# echo 1 >/dev/watchdog

 
Hi Dietmar,

The nodes that I am testing PVE HA does not have IPMI port. I am using the softdog of linux.


There is an easy way to test if the watchdog works correctly:

# echo 1 >/dev/watchdog

This should trigger a reboot within 60 seconds. Does that work?


Before unplugging the network cord of node 2, I test the echo 1 > /dev/watchdog on all the 3 nodes (node1, node2 and node3). It says on all the 3 nodes: "-bash: /dev/watchdog: Device or resource busy"
Then I unplugged the network cord of node2. The HA works as it migrates the VMs on node 2 evenly to node1 and node3. After some minutes, I plugged the network cord to node2 again. The membership quorum is finalised but it does not send the VMs back to node 2.

I execute "echo 1 > /dev/watchdog" command on node 2 and it got executed (while on node1 & node2, it says Device or resources busy).

After execution the command echo on node2 I checked the syslog and it fails with the below same error I had before

Aug 6 14:39:53 node2 kernel: [ 1320.827845] watchdog watchdog0: watchdog did not stop!
Aug 6 14:39:57 node2 pve-ha-lrm[1171]: watchdog update failed - Broken pipe

and it keeps repeating the last line "Broken pipe" on node2. It does NOT trigger the reboot within 60 seconds.

Each time I execute the command echo 1 > /dev/watchdog, it repeats this error on syslog.

Is there any parameters to be done on the BIOS

Thanks

Shafeek
 
Hi Mir,

As Dietmar wrote above you might need to do this before:
# systemctl stop watchdog-mux.service
and then
# echo 1 > /dev/watchdog


Thanks for this reply. Stopping the watchdog-mux first does not change anything. It ends up with the same error as previous

Thanks also for the fallback I will check it.

A+

Shafeek
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!