Hello,
I would like to monitor all my proxmox/ceph installations:
- check that backups start;
- check errors after backup run;
- check if ceph osd is down;
- check if some ha vm is in error state
I plan to use influxdb with telegraf or opendistro for elasticsearch or others.
I started with influxdb with telegraf:
- there is a proxmox plugin for telegraf;
- there is official support by ceph for influxdb and telegraf for sending metrics
After one day:
- ceph support is very undocumented/buggy/error prone
- proxmox and ceph metrics are not useful at all to quickly check for problems above
I now plan to send logs to elasticsearch then parse logs to get alerts.
Can we share some ideas/help/tricks to reach this goal?
Thanks,
Mario
I would like to monitor all my proxmox/ceph installations:
- check that backups start;
- check errors after backup run;
- check if ceph osd is down;
- check if some ha vm is in error state
I plan to use influxdb with telegraf or opendistro for elasticsearch or others.
I started with influxdb with telegraf:
- there is a proxmox plugin for telegraf;
- there is official support by ceph for influxdb and telegraf for sending metrics
After one day:
- ceph support is very undocumented/buggy/error prone
- proxmox and ceph metrics are not useful at all to quickly check for problems above
I now plan to send logs to elasticsearch then parse logs to get alerts.
Can we share some ideas/help/tricks to reach this goal?
Thanks,
Mario