Proxmox VE monitoring

np-prxmx

Active Member
May 11, 2020
55
5
28
55
Hi to all,
i'm looking for a monitor software for my cluster proxmox, for VMs, perfomance and with alerts capabilities.

I've test: Zabbix ( but doesn't do autodiscover for VMs) , Graphite with Grafana o with InfluxDB but i wasn't able to create alerts.

Is there someone that found the best tool to monitor the entire cluster?

Thanks.
Regards
 
I've test: Zabbix ( but doesn't do autodiscover for VMs) , Graphite with Grafana o with InfluxDB but i wasn't able to create alerts.
Best you install the zabbix agent inside the VMs for more detailed metrics.
There is also this template that features VM autodiscovery but you will be limited to what the host can monitor: https://git.isaev.tech/IsaevTech/zbx-tmplt-pve
 
  • Like
Reactions: Tmanok and jebbam
I just came across PandoraFMS, which I haven't used before. It doesn't have anything Proxmox specific, but there is a Debian client in Debian's repo, afiact.

* https://pandorafms.com/en/community/

* https://github.com/pandorafms/pandorafms

* https://packages.debian.org/bullseye/pandorafms-agent

Also apparently free software:

* http://www.shinken-monitoring.org/

* https://www.cacti.net/

Nagios has been around forever:

* https://www.nagios.com/solutions/debian-monitoring/

* Munin...

* Ganglia looks unmaintained.

* https://mmonit.com/monit

* https://github.com/Icinga/icinga2

Nice list at wikipedia:

https://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
 
Last edited:
Hi Guys,

thanks for your suggestions, i'm looking for something integrated inside proxmox VE, because installing agent not ever is usesful, and some customers doesn't look it as a good pratice.

I would find some product that call Proxmox API and return me all information about system and VMs.
 
Hi Guys,

thanks for your suggestions, i'm looking for something integrated inside proxmox VE, because installing agent not ever is usesful, and some customers doesn't look it as a good pratice.

I would find some product that call Proxmox API and return me all information about system and VMs.

Didn'T you read my link above? Zabbix with the linked template does this:
Best you install the zabbix agent inside the VMs for more detailed metrics.
There is also this template that features VM autodiscovery but you will be limited to what the host can monitor: https://git.isaev.tech/IsaevTech/zbx-tmplt-pve
 
I would find some product that call Proxmox API and return me all information about system and VMs.

I think back in the olden days I did this with Prometheus, check_mk, elasticsearch, and Grafana.

I agree it is far better to not have to install a client inside the VM, if possible.

The latest Proxmox has more built-in than the last time I was reviewing this. In the web GUI, go to "Datacenter" --> "Metric Server" --> "Add" gives options for "Graphite" and "InfluxDB". I haven't set those up, but I plan to see what gets dumped there. Then a variety of tools can be used to look at that data (perhaps Grafana or Kibana).

I also agree it would be great to have all of this built into Proxmox. But that woud turn half the code into just a monitoring system, and it is a pretty huge feature request. There are ways to view nodes' activity such as disk and CPU, but not much granularity.

The problem with setting up monitoring that is looking at "everything" is you now have a firehose of data to go through. Setting up what to monitor and what alerts to generate now becomes a big task. I have come across projects that use AI for recognizing what is important, but they are quite alpha at this point. For example for SatNOGS:

https://polarisml.space/

Happy hacking,

-Jeff
 
There are a lot of tools mentioned, that are not monitoring tools for me, so what is your goal in your monitoring?

We use good old Nagios/Icinga for OS and hardware related stuff, the same stuff that has been around for decades.
The PVE metric export is good and you can generate a lot of nice graphs and alerts on them, but they are limited with respect to what you will get from the inside. For some VMs we have also metrics from the inside e.g. ZFS or more generally common software (DNS, DHCP, database, etc.) that has no direct metric integration. And of course software that has, so there can be lot of metric generation on all levels.
 
I just set up a KVM with influxdb and another KVM with Grafana. I didn't follow this, but it pretty much summarizes it:

https://melvin.ovh/proxmox-monitoring-with-influxdb-and-grafana/

I just used the web GUI on the Proxmox side to add the InfluxDB config. Proxmox dumps the data to InfluxDB, then Grafana reads that. I imported Grafana dashboard # 12910, and it looks really swell.

So built in Proxmox + InfluxDB + Grafana looks to be a good (partial) solution at least.
 
I think back in the olden days I did this with Prometheus, check_mk, elasticsearch, and Grafana.

I agree it is far better to not have to install a client inside the VM, if possible.

The latest Proxmox has more built-in than the last time I was reviewing this. In the web GUI, go to "Datacenter" --> "Metric Server" --> "Add" gives options for "Graphite" and "InfluxDB". I haven't set those up, but I plan to see what gets dumped there. Then a variety of tools can be used to look at that data (perhaps Grafana or Kibana).

I also agree it would be great to have all of this built into Proxmox. But that woud turn half the code into just a monitoring system, and it is a pretty huge feature request. There are ways to view nodes' activity such as disk and CPU, but not much granularity.

The problem with setting up monitoring that is looking at "everything" is you now have a firehose of data to go through. Setting up what to monitor and what alerts to generate now becomes a big task. I have come across projects that use AI for recognizing what is important, but they are quite alpha at this point. For example for SatNOGS:

https://polarisml.space/

Happy hacking,

-Jeff
Thanks, because you are the only that understand me.
There are a lot of tools mentioned, that are not monitoring tools for me, so what is your goal in your monitoring?

We use good old Nagios/Icinga for OS and hardware related stuff, the same stuff that has been around for decades.
The PVE metric export is good and you can generate a lot of nice graphs and alerts on them, but they are limited with respect to what you will get from the inside. For some VMs we have also metrics from the inside e.g. ZFS or more generally common software (DNS, DHCP, database, etc.) that has no direct metric integration. And of course software that has, so there can be lot of metric generation on all levels.
As i said at the begining of this threas, i'm looking for a software to monitoring my cluster. More detailed, i'm looking for a software that i monitor: CPU, RAM, disk USAGE, network and VM's UP, disk usage, RAM and CPU. Alerting could an optional, but all software today generates alerts.

Why i need to install something inside, when i use qemu guest tools, and with them i could obtain all the informations that i need?.

I'm looking for a software that query Proxmox API, that query qemu-guest-agent.


Last but not for importance, i'm using a cluster of 10 nodes with ceph, and monitoring network and usage it's important. :)
Before, when i was the dark side, with vmware, i usually used PRTG :) But, it doesn't integrate with Proxmox.



I hope I explained myself better than before.

Thanks again for your time, experiences and feedback!.
 
Last edited:
Why i need to install something inside, when i use qemu guest tools, and with them i could obtain all the informations that i need?.

I'm looking for a software that query Proxmox API, that query qemu-guest-agent.
Depends on what you need. Generally, it's not that easy and PVE does not query special stuff through that, it uses the metric export of qemu itself.
The guest agent backdoor can run commands and return its output, but that is much more complicated to use than just installing telegraf inside of your VM and exporting whatever you like directly to influxdb. You can automate that easily.

If you go with the guest agent backdoor, please report back what and how you did it.
 
Thanks, because you are the only that understand me.

As i said at the begining of this threas, i'm looking for a software to monitoring my cluster. More detailed, i'm looking for a software that i monitor: CPU, RAM, disk USAGE, network and VM's UP, disk usage, RAM and CPU. Alerting could an optional, but all software today generates alerts.

Why i need to install something inside, when i use qemu guest tools, and with them i could obtain all the informations that i need?.

I'm looking for a software that query Proxmox API, that query qemu-guest-agent.
Thats exactly what zabbix with the PVE template is doing I already linked here 2 times. It uses the Proxmox API and can also autodiscover nodes of a cluster and guests without needing to install anything inside the guests. It monitors:
- host latest logs
- if updates are available for host
- host CPU
- host RAM
- host swap
- host PVE version
- host uptime
- host filesystems
- guest CPU
- guest RAM
- guest swap
- guest filesystem (if available by PVE)
- guest uptime
- guest started/stopped

And if you want more you can install the zabbix-agent on the host and/or the guests to get even more metrics. And there are endless templates to add more features. I for example like the SMART, ZFS-on-Linux and mdadm templates so I get warnings if the ARC peforms bad, a pool degrades, SW raid array fails or a SSD is wearing too fast. Then the NUT template to monitor my UPS and server power usage. A Supermicro template to monitor FANs, voltages and temperatures, ...it can even monitor a Fritzbox if you want.
And stuff like monitoring dropped packets, downed interfaces and so on zabbix can monitor out of the box.
 
Last edited:
Both Zabbix and Influx can access the items from @Dunuin's list from outside the KVM, so nothing needs to be installed inside the KVMs, if you don't want it.

To get it to work with Influxdb, you configure directly in Proxmox, then have to install Influx and Grafana somewhere (possibly inside or outside the cluster). Influxdb is like a database and Grafana does the visualization web page. Debian has a package for Influxdb. For Grafana, you'll have to use their custom repository or similar.

I *think* to access things like SMART, temperature (lm-sensors), and data like that, for Influx/Grafana setup you may also need to install Telegraf, which isn't in the Debian repositories.

For Zabbix you can install the package on your proxmox metal nodes with `apt` straight from the Debian repository.

Both approaches are similar, but look quite different in their output. I think the Grafana graphs and dashboards look way better and are easier to set up. There's a lot of decent Grafana dashboards for Proxmox that are easily installed. Zabbix is much more complete as a tracking system for anything about your system and probably has better alerts. Grafana is used for graphing pretty much "everything" (e.g. bird migrations, satellite transmissions, or whatever you want to graph), while Zabbix is focused more on just being used for servers and networking.

You can install and run both simultaneously to see which you like better.
 
I personally don't look at graphs that often. My zabbix collects as much data as possible, which I never look at, and I got hundreds of alerts that will pop up on the dashboard if zabbix thinks some data looks suspicious so that I should have a closer look at it.
 
Ya, Zabbix alerts are probably a lot better. I'm going to set it up now. I also like that it can be used without installing external software/repos. Do you run your Zabbix web server and database inside a KVM/CT or outside the cluster?
 
Until yesterday Zabbix 5.0 LTS in a VM. Now Zabbix 6.0 LTS in a unprivileged LXC. Looks fine so far, except that some old templates are now partially broken (numbers showed in wrong format, missing names of autodiscovered stuff so you can't see what the numbers beside them mean) and that the zabbix-agent2 is making my PVE webUI unresponsive (but the zabbix-agent is working fine). And when installing it with madiadb 10.6 zabbix caused my mariadb to use 100% CPU utilization so zabbix wasn't usable. But with mariadb 10.5 it works. But according to the documentation both 10.5 and 10.6 should be supported.
 
Last edited:
  • Like
Reactions: jebbam
I was able to get Zabbix installed and running on Debian Bookworm KVM. It is mostly working, but the web login also has these errors all over it, which I haven't tracked down yet.

Code:
    mysqli::real_connect(): Passing null to parameter #7 ($flags) of type int is deprecated [zabbix.php:21 → require_once() → ZBase->run() → ZBase->initDB() → DBconnect() → MysqlDbBackend->connect() → mysqli->real_connect() in include/classes/db/MysqlDbBackend.php:173]
    setcookie(): Passing null to parameter #5 ($domain) of type string is deprecated [zabbix.php:21 → require_once() → ZBase->run() → ZBase->authenticateUser() → CWebUser::checkAuthentication() → CWebUser::setSessionCookie() → zbx_setcookie() → setcookie() in include/func.inc.php:107]

Edit: Looks like this error is due to using PHP8 in Debian Bookworm, which isn't supported by Zabbix. So stick with Debian Bullseye (stable) for Zabbix 5, apparently.
 
Last edited:
Super annoying. Last night they shutdown the old share.zabbix.com site and replaced it with a new one where 80% of the templates are missing. Now after like 20 hours of work I'm 90% done but can't finish it anymore because some of the templates I'm still missing were only hosted on the old site so its now lost and can't be used anymore. For some of them I got a copy of the config files and the template file but without a installation guide its hard to set them up again...
Super stupid decision of them to take it offline. Biggest advantage of zabbix was the great amount of availabe integrations created by the users which is now all lost...
 
  • Like
Reactions: jebbam
Ya, that's why I don't like pulling from external sites in general, even plugins... Need to mirror everything you use... It looks like everything is there on the share site though (?).
 
Ya, that's why I don't like pulling from external sites in general, even plugins... Need to mirror everything you use... It looks like everything is there on the share site though (?).
No, the site is still online but they replaced it with the new one where most of the templates are missing. If you for example searched 2 days ago for my "Aruba 1930" managed switch you found this template: https://share.zabbix.com/network_devices/aruba/aruba-instant-on-1930. All the links to the old templates are now dead and you get a "404 File not found". So I now won't be able to monitor my switch anymore. Same with hundreds or thousands of other integrations.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!