Is there a way to instruct proxmox to automatically restart a VM after a crash?

yonoss · Oct 3, 2023

Hi,
Is there a way to instruct proxmox to automatically restart a VM after a crash? Something similar to "Automatically start VM on boot", but to trigger the VM restart in case of a VM failure/crash.

By default, if a VM crashes, Proxmox is not restarting it. And I have to log into the console and start it manually, which is not ok for a production environment.

Thanks!

bbgeek17 · Oct 3, 2023

It'd be helpful to define what exactly "vm crash" means. But in general there is no mechanism in PVE that would restart a VM because VM's OS failed. It may be possible for HA mechanism to notice that "kvm" process failed (ie killed by OOM) and restart the VM. However thats probably not what you are looking to protect from.

To sum up - PVE is not the right tool to monitor VM OS health. You need to implement things like: watchdogs, app monitors, API monitors, health checks etc. Which ones to implement and how depends on the application you are trying to protect.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

yonoss · Oct 3, 2023

I'm looking for a VM restart mechanism that will work for any type of VM failures. Most of the VM crashes are indeed caused by OOM errors.

bbgeek17 · Oct 3, 2023

yonoss said:
I'm looking for a VM restart mechanism that will work for any type of VM failures

there is no off-the-shelf single mechanism to achieve it. You will need to create a custom script that monitors health of your VM/application.

yonoss said:
Most of the VM crashes are indeed caused by OOM errors

That is an infrastructure problem that should never happen in production. This is best solved by having sufficient RAM in your hypervisor to cover VM needs. Critical production environment should never be overprovisioned.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

LnxBil · Oct 3, 2023

yonoss said:
Most of the VM crashes are indeed caused by OOM errors.

AFAIK, those are fixed by defining the VM as a HA VM, which should restart the VM in case of an external failure.

bbgeek17 said:
there is no off-the-shelf single mechanism to achieve it. You will need to create a custom script that monitors health of your VM/application.

For some guest OSes, there was a mechanism called watchdog timer, which does what you want yet in the end, it depends on the type of error in the VM. As @bbgeek17 already mentioned, setup proper service health monitoring and act upon them accordingly.

wishy · Mar 28, 2024

I've had a problem today where the power isn't running smoothly due to high winds, and despite a UPS, guests have randomly terminated, one at a time during various power issues. The timestamps fit with the UPS kicking in. Oddly, wasn't a problem under ESXi (even if I am glad to see the back of it)

Anyway, wrote this script and minimally tested it. I've set it to run as a cron job every 10 minutes. The script restarts 1 server in a stopped state

Bash:

#!/bin/bash

stopped_count=$(/usr/sbin/qm list | grep stopped | wc -l)

# Check if the line count is greater than 0
if [ $stopped_count -gt 0 ]; then
    # Servers have stopped
    #echo "Oh Dear.."
    # Search qm list for stopped VMs, if there are, awk the numbers, sort randomly, and pick the first random entry
    first_failed_guest=$(/usr/sbin/qm list | grep stopped | awk '{print $1}' | sort -R | head -1)
    # Send the qm list by email, indicating which guest will restart
    /usr/sbin/qm list | mail -s "Guest Stopped, restarting $first_failed_guest" your@email.here
    # Actually start the selected guest
    /usr/sbin/qm start $first_failed_guest
fi

zombie-man · Dec 1, 2024

wishy, only for my curiosity - why builded HA feature not enough?

wishy · Dec 1, 2024

zombie-man said:
wishy, only for my curiosity - why builded HA feature not enough?

I guess they would be. It's a home server, I don't particularly want to pay for power for 3 nodes, so I just make the 1 node reasonably redundant, have backups, and spare hardware if the main node goes pop

LnxBil · Dec 3, 2024

wishy said:
I guess they would be. It's a home server, I don't particularly want to pay for power for 3 nodes, so I just make the 1 node reasonably redundant, have backups, and spare hardware if the main node goes pop

HA stands for a lot of stuff, yet a simple "keep-the-VM-online" is also doable on a single node with the help of HA. If the VM gets stopped (e.g. a poweroff) inside of the VM, it will get started automatically. Besides that, anything else that has been written in this thread is still true. If the VM crashes (e.g. kernel panic) it'll be restarted from the inside. If you use a watchdog and the VM freezes, it'll be restarted. Aynthing else that does not work here has to be monitored from the outside.

Search

Search

Is there a way to instruct proxmox to automatically restart a VM after a crash?

yonoss

Member

bbgeek17

Distinguished Member

yonoss

Member

bbgeek17

Distinguished Member

LnxBil

Distinguished Member

wishy

New Member

zombie-man

Member

wishy

New Member

LnxBil

Distinguished Member