Open Source Network Monitoring Tool

All OSD graphs are created from smartmon values including the temp. graph i posted earlier. I am pulling varieties of data such as read/write/sector errors etc. into graphs. Is it what you are talking about? So far it helped us identify 6 aging OSDs which probably would have gone unnoticed without the ability to set threshold notification.

To add to my previous post, the graphing ability also helped me to play with temperature control as you can see from the graph. :) We cool big part of the datacenter using outside freezing air. The big temperature drops were result of pushing more cool air in.

Hello Symmcom


I've got zabbix up and running and am in the process of researching how to get smartmon set up.

So far I've found a couple of paths to set up smartmon in zabbix.

Could you tell me if you used one of these, or something else as a base to your smartmon setup?

https://share.zabbix.com/storage-devices/smart-monitoring-with-smartmontools-lld
https://share.zabbix.com/storage-devices/storage-device-monitoring-via-smartmontools
 
I do use smartmon to collect HDD data and pass them to Zabbix. I only collect HDD temp. and Serial # for now. In the zabbix_agent.conf i have the following added:
UserParameter=hdd.temp[*],smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-90
UserParameter=hdd.serial[*],smartctl -i /dev/$1 | grep 'Serial Number' | cut -c19-

I created a template for HDD data collection as following:
zabbix-hdd1.PNG

Here is the graph of a Proxmox+Ceph node showing temperature data collected using the template:
zabbix-hdd2.PNG
 
thank you.

Zabbix was not too hard to do the initial set up. The benefits are amazing.

In the past I tried to set up other systems in a day or two but could not get them to a usable state.

Should a wiki page be started with tips to set up different monitoring tools with pve ?
 
Last edited: