I'll edit this as I get additions/corrections.
Needs
----------
* System that produces alerts when something has gone bad.
* System that warns when things are getting bad.
* System that allows visualization of metrics to aid in system optimization.
* Monitoring of hardware, such as temperature and failed fans.
* Perhaps extend to monitoring of UPS and other relevant hardware.
* Monitoring of KVM metrics available via qemu-agent.
* Monitoring of KVM metrics not available via qemu-agent (such as KVM cached memory).
* Monitoring of filesystems, such as ZFS (I don't personally use this) and Ceph.
* Monitoring of network switches.
Proxmox + InfluxDB + Grafana + Prometheus
-----------------------------------------------------------------
This is the best setup so far, imho. Proxmox Metric Server and Prometheus node exporters of various flavours export to an InfluxDB server. The data is then viewed in a web browser with Grafana.
Pros:
Uses Proxmox "Metric Server"
InfluxDB is in Debian/Proxmox.
Pretty graphs.
Prometheus is in Debian
Cons:
Grafana isn't in Debian.
General use, no concept of data center
Proxmox + InfluxDB + Grafana + Telegraf
-----------------------------------------------------------
Pros:
Uses Proxmox "Metric Server"
InfluxDB is in Debian/Proxmox.
Pretty graphs.
Can also be used with Chronograf (alternative to Grafana) and Kapacitor (alerts?)
Cons:
Grafana isn't in Debian.
Telegraf isn't in Debian.
Not really good alerting dashboards (yet?)
General use, no concept of data center
Need to set up Telegraf too for sensor data
Prometheus
-----------------
This can be used with
Pros:
In Debian.
Well established.
Many plugins (35 packages in debian repo).
Cons:
No direct Proxmox Metric Server integration. But can work with Influx (iirc), so Metric Server + Influx + Grafana then add Prometheus to pick up metrics like sensors and in-VM monitoring that the Metric Server doesn't cover.
Zabbix
----------
Pros:
Long history
Designed for use in data center
Main application is in Debian
Good for hardware sensors (e.g. hard drive temperature)
Cons:
Appears to be heading in the wrong direction, but was once a viable solution.
No direct Proxmox integration. Could be wrong assumption: but I think it will scrape from a virt lib not consume the Proxmox "Metric Server".
VictoriaMetrics
----------------------
Pros:
Can be used as drop-in replacement for Influx.
Supposedly much higher performance in speed, storage size, and scalability than other solutions.
Can be used with Prometheus.
Should be able to connect directly to Proxmox Metric Server using InfluxDB export.
In Debian.
Has its own Grafana dashboards.
Cons:
Smaller community ?
Not as established?
Per Debian changes, it doesn't include the web GUI: "Exclude and disable vmui from code, based on TypeScript and node.js, with unclear source provenance."
CV4PVE
------------
Pros:
Explicitly made for Proxmox.
Proxmox Certified Partner.
Same free software / subscription model as Proxmox, afaict.
Uses Proxmox "Metric Server".
Uses InfluxDB which is in Debian.
Made a lot of Grafana dashboards usable by everyone.
Suite of many more tools for Proxmox.
Cons:
Requires Telegraf not in Debian.
In C# mostly?
I am setting this up. I used the "Toolbox" docker and all that went smooth. When you log into the web GUI it gives a really obnoxious subscription notice. I configured authentication with the Proxmox cluster and after some fudging, got that to connect and authenticate. But once it finally authenticated, CV4PVE seems to have locked access to other parts of the CV4PVE GUI (???) since I don't have a CV4PVE subscription key. I tried to buy one on their site, but it isn't automated like Proxmox, you have to fill out a request form. Then in the CV4PVE to do a request via that, you have to use an SMTP server, so gotta set up firewall etc. for that. It was looking kind of nice, but the license junk is what I want to avoid... Now I'm logged out of the CV4PVE web gui and can't log back in. hmm. Not sure about this one.
Netdata
------------
I installed this on Debian KVM and on a OPNSense firewall. It collects a *lot* of metrics. The cloud interface, which allows you to see multiple nodes at once, is proprietary service.
There is also a "TV" dashboard, available at
http://127.0.0.1:19999/tv.html
When I went to the "TV" URL, it pulled data from "registry.my-netdata.io", despite me not wanting to use any external services. That hostname maps to Google IPs. Sigh.
Pros:
In Debian.
Been around a long time, lots of development.
Cons:
Phones home to proprietary services by default even when using Debian's package.
No direct Proxmox integration (has libvirt and others, but not Proxmox directly).
Check_mk
---------------
Pros:
Established.
Proxmox Integration.
Cons:
PandoraFMS
------------------
Pros:
Proxmox plugin.
Huge featureset.
Agent in Debian repo.
Cons:
"Metaconsole" and other key features are non-free "enterprise".
Doesn't connect directly with "Metric Server".
Server isn't in Debian.
Installation looks long/complex.
"Metaconsole" and other key features are non-free "enterprise".
Small community (three posts in forum in last month).
Graphite
--------------
Pros:
Integrated into Proxmox "Metric Server".
Cons:
Code stagnating? Appears to still be based on obsolete Python 2.7.
Not in Debian.
Nagios & forks
---------------------
Pros:
This has been around for years, tried and true.
Many variations/forks
In Debian
Good for hardware sensors (e.g. hard drive temperature)
Cons:
Many variationsf/forks
No direct Proxmox integration. Could be wrong assumption: but I think it will scrape from a virt lib not consume the Proxmox "Metric Server".
Icinga
---------
Pros:
Old, established.
In Debian.
Cons:
No Proxmox plugin/Metric server.
icingaweb isn't in Debian Bookworm (testing) because it doesn't work with PHP 8.1. Could be problem in future.
Monit
---------
Pros:
In Debian.
Cons:
Looks like MMonit, the web gui, is proprietary (?)
Misc
-------
Also notable:
* Graylog. Centralized logging system. Not in Debian.
* Collectd. Monitoring. Old school, works with lots of other systems. Good for hardware monitoring iirc. In Debian.
Linkies
-----------
Grafana:
https://grafana.com/
Grafana Proxmox Dashboards:
https://grafana.com/grafana/dashboa...rce=influxdb&orderBy=updatedAt&direction=desc
InfluxDB:
https://www.influxdata.com/products/influxdb-overview/
Telegraf:
https://www.influxdata.com/time-series-platform/telegraf/
https://github.com/influxdata/telegraf
Cronograf:
https://www.influxdata.com/time-series-platform/chronograf/
https://github.com/influxdata/chronograf
Kapacitor:
https://www.influxdata.com/time-series-platform/kapacitor/
https://github.com/influxdata/kapacitor
Prometheus:
https://prometheus.io/
Monit:
https://mmonit.com/monit/
Zabbix:
https://www.zabbix.com/
CV4PVE:
https://www.cv4pve-tools.com/en/
CV4PVE Proxmox:
https://www.corsinvest.it/proxmox/?lang=en
PandoraFMS:
https://pandorafms.com/en/
CollectD:
https://collectd.org/
Graphite:
https://graphiteapp.org/
Nagios:
https://www.nagios.org/
Icinga:
https://icinga.com/
Checkmk:
https://checkmk.com/
https://checkmk.com/blog/proxmox-monitoring
Graylog:
https://www.graylog.org/
Netdata:
https://www.netdata.cloud/
VictoriaMetrics:
https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html
https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html
Cacti: in Debian.
https://www.cacti.net/ Web frontend for RRD tool.
Also note OPNSense includes plugins for: collectd, munin, snmp, netdata, Prometheus, ntopng, and zabbix.