monitoring plugin - check_pve

6uellerbpanda

Well-Known Member
Sep 15, 2015
100
6
58
Linz
6uellerbpanda.gitlab.io
Introducing a Naemon/Icinga/Nagios plugin for Proxmox Virtual Environment

Version: 0.1

Following checks are available:

  • Cluster status
  • SMART status of disks
  • Available updates
  • Dead/Stopped services
  • Storage/Datastore usage
  • CPU, Memory, IO Wait usage

For updates, readme, etc... see - https://gitlab.com/6uellerBpanda/check_pve


Feel free to test, comment, improve,...
 
This plugin is great. Only thing i am missing is if it could check for backups and their age (i.e. if taken properly - timestamp is ok).
I think the proxmox api supports this, so it maybe possible to add this feature?
 
Not sure what I've done wrong here. Trying to run this check on localhost before folding the command into Nagios I'm getting an error.

root@pve:/usr/lib/nagios/plugins# ./check_pve.rb -k 10.31.1.5 -u monitoring@pve -p [EXPUNGED] -n pve -m node-smart-status
Unknown - undefined method `include?' for nil:NilClass
 
Should add verbose isn't much help:

root@pve:/usr/lib/nagios/plugins# ./check_pve.rb -k 10.31.1.5 -u monitoring@pve -p [EXPUNGED] -n pve -m node-smart-status -v
check_pve v.0.3.0
Unknown - undefined method `include?' for nil:NilClass
 
@Taleya that means one of the checks of the plugins returns nil (empty) - which the plugin doesnt expect to happen. Basically the plugin would need to be fixed to handle that. I am not good at ruby so i dont feel like doing this.
You could try to find out which of the checks returns nil by using strace when running it or tcpdump (if over network). Unfortunately the plugin
doesnt have a debug option, so it cant do it by itself.
 
@Taleya that means one of the checks of the plugins returns nil (empty) - which the plugin doesnt expect to happen. Basically the plugin would need to be fixed to handle that. I am not good at ruby so i dont feel like doing this.
You could try to find out which of the checks returns nil by using strace when running it or tcpdump (if over network). Unfortunately the plugin
doesnt have a debug option, so it cant do it by itself.

Thanks - I'm running it directly on the pve root - haven't gotten to the programming it into the nagios server as of yet. grr.

strace says "Unknown - Failed to open TCP connection to trace:8006 getaddrinfo: Name or service not known"

same issue running it replacing the ip with localhost, not sure why. pve definitely running webhost on 8006, and it's accessible. Same issue on a few different commands, so I'm missing something fundamental.
 
ok so:

- installed ruby, defined users, installed NRPE, set check_pve as an executable, all that fun stuff. Tested commands, confirmed working


put command in Nrpe config file on the PVE:

#check RAM
command[check_pv_ram]=/usr/lib/nagios/plugins/check_pve.rb -ks 10.31.1.5 -u monitoring@pve -p [REDACTED] -n pve -m node-memory-usage -w 80 -c 90


restarted NRPE

Put command in nagios server command config:

define command {
command_name check_pv_ram
command_line $USER1/check_pv_ram -H $HOSTADRESS$
}

Put command in host-services-cfg (defined a section for Proxmox)


#Check RAM

define service {
use proxmox-servers-service
hostgroup_name proxmox-servers
service_description Check Ram from PVE Ruby Script
check_command check_pv_ram
}



Verify nagios config and restart daemon aaaaan

(Return code of 127 is out of bounds. Check if plugin exists)


ok. Can anyone tell me how stuffed up the define command?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!