[TUTORIAL] Hardware watchdog at a per-VM level

Hi, have had some problems with OOM killer "at home" and was searching for any solution - at least to detect and recover a killed VM.

My idea was to monitor the logfile (on the host) for any occurrence of "OOM killers" and if so (Could doublecheck that VM is really down with a simple ping) -> make a restart of the regarding VM.

Such a stupid idea or impossible for any reason?
 
Hi, have had some problems with OOM killer "at home" and was searching for any solution - at least to detect and recover a killed VM.

My idea was to monitor the logfile (on the host) for any occurrence of "OOM killers" and if so (Could doublecheck that VM is really down with a simple ping) -> make a restart of the regarding VM.
In case of an OOM, the VM is killed, so you need to start it. If the VM was just killed, starting it will probably yield another OOM. You need to fix the OOM condition in order to get a stable system.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!