FR: Notification on oom-killer

garyd9

Member
Nov 10, 2023
44
3
8
Not sure the correct place to make feature requests, but it'd be really nice if proxmox could send an email notification when oom-killer starts killing container (or any other) processes.

I had a process in a container with a (fairly nasty) memory leak... and had no idea it was going on until a day later when that container's primary process wasn't working (due to oom-killer killing it.)
 
Not sure the correct place to make feature requests,
Best you create a feature request in the bug tracker too: https://bugzilla.proxmox.com

it'd be really nice if proxmox could send an email notification when oom-killer starts killing container (or any other) processes.
Yes, integrated monitoring and alerting is really a thing that could be improved. As a workaround you could make use of something external like zabbix or graylog and use it to monitor your syslog and alert you when new OOM logs are found. Thats what I'm doing here with graylog so I don't miss OOMs. And it monitors the logs of all the VMs and LXCs too, so I also get notified when the guestOS kills a process what PVE could never do.
I would also like to see some better notifications in the webUI. For example a big red blinking box when a degraded ZFS pool is found and so on. Without proper external monitoring it's so easy to miss critical events that could easily be monitored.
 
Last edited:
Best you create a feature request in the bug tracker too: https://bugzilla.proxmox.com


Yes, integrated monitoring and alerting is really a thing that could be improved. As a workaround you could make use of something external like zabbix or graylog and use it to monitor your syslog and alert you when new OOM logs are found. Thats what I'm doing here with graylog so I don't miss OOMs. And it monitors the logs of all the VMs and LXCs too, so I also get notified when the guestOS kills a process what PVE could never do.
I would also like to see some better notifications in the webUI. For example a big red blinking box when a degraded ZFS pool is found and so on. Without proper external monitoring it's so easy to miss critical events that could easily be monitored.
Done. https://bugzilla.proxmox.com/show_bug.cgi?id=5052

Thank you! I don't know anything about "Graylog." I suppose I could add a cron script for the proxmox host to grep the output of journalctl searching for "oom-killer" and send and email if found.
 
So, here's a very quick and dirty work-around: create a crontab entry similar to:

Code:
40 * * * * /usr/bin/journalctl -t kernel -S "1hr ago" -g oom -q --no-pager

This will run every hour (at 40 minutes past the hour) a command that searches the journal for "kernel" messages containing the string "oom" that occurred in the last hour. If none are found, nothing happens. If any are found, they are sent via email to the configured root user email.

Here's an example email:

1700275830163.png
 
  • Like
Reactions: Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!