[TUTORIAL] CheckMK local check for monitoring backup status

pbengert

Member
Apr 1, 2022
4
9
8
Hi! I just wanted to share my script to monitor the backup status for the vms in Proxmox. It is written for use in CheckMK. You need to have the CheckMK linux client installed on proxmox. I should be easily adaptable for other systems like zabbix, nagios etc.

You need to put the script in:
/usr/lib/check_mk_agent/local/ and make it executable with chmod +x checkmk_proxmox_backup.py

Feel free to use it. (MIT Licence)

Python:
#!/usr/bin/python3

# Set the nodes to check (you can only set more than one node if you have a cluster)
nodes = ['pm0', 'pm1', 'pm2']
# How far shall the script look back in time, check on the command line that it works in: date -d "-3 days" +%s
past_time = "-3 days"

####### Do not adjust below this line ##########

import subprocess
import json
import datetime

class Backup:
    def __init__(self, vmid):
        self.vmid = vmid
        self.newestendtime = 0
        self.node = ''
        self.status = ''

vms = {}

for node in nodes:
    command = f'pvesh get /nodes/{node}/tasks/ -typefilter vzdump --output-format json -since `date -d "{past_time}" +%s`'
    # Uncomment next line for debugging - find errors
    #command = f'pvesh get /nodes/{node}/tasks/ -typefilter vzdump -errors --output-format json' ## for debugging to find errors

    (command_status, command_output) = subprocess.getstatusoutput(command)
    tasks = json.loads(command_output)
    for task in tasks:
        if task['id'] in vms:
            if vms[task['id']].newestendtime < task['endtime']: #we have a newer task, so update
                vms[task['id']].newestendtime = task['endtime']
                vms[task['id']].node = task['node']
                vms[task['id']].status = task['status']
        else:
            vms[task['id']] = Backup(task['id'])
            vms[task['id']].newestendtime = task['endtime']
            vms[task['id']].node = task['node']
            vms[task['id']].status = task['status']

# Now evaluate:
for vm in sorted(vms):
    ts = datetime.datetime.fromtimestamp(vms[vm].newestendtime).strftime('%Y-%m-%d %H:%M:%S')
    if vms[vm].status == 'OK':
        print(f'0 "VM-Backup {vm}" - "Last Backup of VM {vm} succeeded at {ts} on node {vms[vm].node} with status {vms[vm].status}')
    else:
        print(f'2 "VM-Backup {vm}" - "Last Backup of VM {vm} failed at {ts} on node {vms[vm].node} with status {vms[vm].status}')
 
There is a Proxmox special agent in recent CheckMK versions that talks to the Proxmox API and retrieves the backup status among other things.
You are refering to:

Proxmox VE VM Backup
Setup > Services > Service monitoring rules > Proxmox VE VM Backup

or

Proxmox VE VM Snapshot Age
Setup > Services > Service monitoring rules > Proxmox VE VM Snapshot Age

right?
 
@RolandK

Could you please be so kind and show me your Backup and Snapshot Age configuration with a screenshot of each? How should it look in the services? Does it appear on the Proxmox host or in the individual VMs? My rules look like this, but it just doesn't show up anywhere.

Many thanks!

1746279968717.png
1746280576901.png