From my testing of Proxmox, one frustration I had was that unlike my previous Xen environment, Proxmox does not detect if a VM has panicked/crashed/frozen and as such won't reboot the VM, potentially ending up in hours of downtime until the issue is realised and resolved.
After a bit of digging on various sites and pulling together a few sources, I created my own guide on doing this, but thought it'd be helpful to share with the Proxmox community. It is possible to enable a watchdog service on your VM's that integrates with Proxmox, effectively mimicking a physical hardware watchdog that'd reset bare metal in the instance of a panic.
Of course some care should be taken with this as a misconfiguration could potentially put your VM into a cycle of resets (but if you follow carefully you should be fine). I've had this configuration running on 12 of my VM's for a couple of months now without issue.
The below is using apt in Ubuntu 20.04 but I'm sure different OS's will have a similar flow.
1. Modify your VM config file on the Proxmox node
**Anything below this line should be performed on the VM, NOT the PVE node**
2. Install watchdog on the VM with
3. Configure the watchdog service by appending the below options to
4. By default, the i6300esb device is blacklisted within Linux. To work around this, modify the newly created
5. Enable the watchdog service to start at next boot with
6. Fully power off the VM (not restart). This is important as it'll allow it to adopt the new hardware configuration.
7. Power the VM back on and check the watchdog module is up and working by running
Everything is now configured and the only thing left to do is to give it a test. To run a test, trigger a kernel panic by running
Sidenote: it'd be great if in the Proxmox UI under hardware you could manually add custom lines. It's a shame that once configured you can't see this on the VM hardware page.
After a bit of digging on various sites and pulling together a few sources, I created my own guide on doing this, but thought it'd be helpful to share with the Proxmox community. It is possible to enable a watchdog service on your VM's that integrates with Proxmox, effectively mimicking a physical hardware watchdog that'd reset bare metal in the instance of a panic.
Of course some care should be taken with this as a misconfiguration could potentially put your VM into a cycle of resets (but if you follow carefully you should be fine). I've had this configuration running on 12 of my VM's for a couple of months now without issue.
The below is using apt in Ubuntu 20.04 but I'm sure different OS's will have a similar flow.
1. Modify your VM config file on the Proxmox node
nano /etc/pve/qemu-server/[server_id].conf
and add our virtual watchdog device. We'll be using the i6300esb watchdog as although old, is supported by KVM and provides the functionality we need. To add the watchdog device, append the below to the config file and save:
Code:
watchdog: model=i6300esb,action=reset
**Anything below this line should be performed on the VM, NOT the PVE node**
2. Install watchdog on the VM with
apt install watchdog
3. Configure the watchdog service by appending the below options to
/etc/watchdog.conf
. This tells the watchdog service the device it should be heartbeating with.
Code:
watchdog-device = /dev/watchdog
log-dir = /var/log/watchdog
realtime = yes
priority = 1
4. By default, the i6300esb device is blacklisted within Linux. To work around this, modify the newly created
/etc/default/watchdog
file and set the watchdog_module
to i6300esb
.5. Enable the watchdog service to start at next boot with
systemctl enable watchdog
6. Fully power off the VM (not restart). This is important as it'll allow it to adopt the new hardware configuration.
7. Power the VM back on and check the watchdog module is up and working by running
dmesg | grep i6300
. You should see something like the below:
Code:
[ 7.249538] i6300ESB timer 0000:00:04.0: initialized. heartbeat=30 sec (nowayout=0)
Everything is now configured and the only thing left to do is to give it a test. To run a test, trigger a kernel panic by running
echo c > /proc/sysrq-trigger
. After a short while (60 seconds or so) you should see the VM automatically reset, and you're done! I hope you find this useful.Sidenote: it'd be great if in the Proxmox UI under hardware you could manually add custom lines. It's a shame that once configured you can't see this on the VM hardware page.