[SOLVED] Is Prometheus an optimal option to monitor Proxmox?

PythonTrader · Jul 25, 2024

I have a Prometheus + Grafana setup in my home lab that is working well.

I have a Proxmox cluster that is consists of 3 nodes. I like to monitor my cluster.

I notice that Proxmox has no built-in support for Prometheus:

Considering that I already have a working Prometheus instance, should I use it for Proxmox or use what monitoring that Proxmox come with it?

bitmover · Jul 26, 2024

I've the same setup and also looking for a good solution.

There is a prometheus-exporter ( https://github.com/prometheus-pve/prometheus-pve-exporter ), but this does not feel right.

As there are already some existing Dashboards for Proxmox and InfluxDB available (https://grafana.com/grafana/dashboards/?dataSource=influxdb&search=proxmox) - so I am exploring the influxdb solution.

Initial testing looks good ( low IO , some working Dashboards ), but now i need to "understand" the influxdb-server config and make it secure.

One warning for others : I used an existing "graphite"-vm for one week for my 6 proxmox-nodes. During that week the graphite-VM was suffering from high IO.
Additionally when the graphite-vm was down, the Proxmox-WebUI became unesponsive ( showing only '?' and no VM-Names anymore ). Starting the graphite-VM or disabling the graphite-metics resolved the problem.

PythonTrader · Jul 26, 2024

bitmover said:
I've the same setup and also looking for a good solution.

There is a prometheus-exporter ( https://github.com/prometheus-pve/prometheus-pve-exporter ), but this does not feel right.

As there are already some existing Dashboards for Proxmox and InfluxDB available (https://grafana.com/grafana/dashboards/?dataSource=influxdb&search=proxmox) - so I am exploring the influxdb solution.

Initial testing looks good ( low IO , some working Dashboards ), but now i need to "understand" the influxdb-server config and make it secure.

One warning for others : I used an existing "graphite"-vm for one week for my 6 proxmox-nodes. During that week the graphite-VM was suffering from high IO.
Additionally when the graphite-vm was down, the Proxmox-WebUI became unesponsive ( showing only '?' and no VM-Names anymore ). Starting the graphite-VM or disabling the graphite-metics resolved the problem.

Thank you for sharing your experience.

It is possible to have an instance of influxdb in a docker compose stack, so that it does not need much of setup, we need to provide volumens for storage and config.

I was hoping not to add yet another storage to what I already have (Loki and Prometheus )

PythonTrader · Jul 26, 2024

bitmover said:
I've the same setup and also looking for a good solution.

There is a prometheus-exporter ( https://github.com/prometheus-pve/prometheus-pve-exporter ), but this does not feel right.

As there are already some existing Dashboards for Proxmox and InfluxDB available (https://grafana.com/grafana/dashboards/?dataSource=influxdb&search=proxmox) - so I am exploring the influxdb solution.

Initial testing looks good ( low IO , some working Dashboards ), but now i need to "understand" the influxdb-server config and make it secure.

One warning for others : I used an existing "graphite"-vm for one week for my 6 proxmox-nodes. During that week the graphite-VM was suffering from high IO.
Additionally when the graphite-vm was down, the Proxmox-WebUI became unesponsive ( showing only '?' and no VM-Names anymore ). Starting the graphite-VM or disabling the graphite-metics resolved the problem.

BTW, agree that prometheus-pve-exporter doesn't feel right.

I learned that we can have it on a separate machine, but not sure repercussions.

deepcloud · Jul 30, 2024

Hello,

I was considering

https://www.zabbix.com/ru/integrations/proxmox#proxmox
or
https://www.netdata.cloud/integrations/data-collection/containers-and-vms/proxmox-ve/

any thoughts

Taverius · Jan 2, 2025

I was searching for the same thing and found this thread.

Went and looked at pve-exporter - its fine, you can deploy it as a container easily, but you're making an API call you don't need since we already have exporters built-in, and a full scrape is multiple API calls. That seems a bit unnecessary when you can push from PVE.

Then I found this: https://github.com/prometheus/graphite_exporter

Honestly, for a true-blue big-cash-money production environment you're probably best off with influxdb, but if you absolutely must squeeze your proxmox stats onto the rest of your prometheus/mimir setup, it works fine, just need to set up some pattern-matching/regex to add labeling to the data.

This is what I'm running, I have a very minimal setup of 1 node not in a cluster and it only has VMs, no LXCs:

YAML:

mappings:
  # Matches Nodes.NICs
  - match: 'proxmox\.nodes\.([^\.]+)\.nics\.(.+)\.(receive|transmit)'
    match_type: regex
    name: 'proxmox_nodes_nics_${3}'
    labels:
      node: ${1}
      nic: ${2}
  # Node Uptime
  - match: proxmox.nodes.*.uptime
    name: proxmox_nodes_uptime
    labels:
      node: $1
  # All other node stats
  - match: 'proxmox.nodes.*.*.*'
    name: 'proxmox_nodes_${2}_${3}'
    labels:
      node: $1
  # Cluster storage
  - match: 'proxmox.storages.*.*.*'
    name: 'proxmox_storages_${3}'
    labels:
      node: $1
      id: $2
  # I don't need a stat called vmid with the vmid in the name
  # and the vmid in the value
  - match: proxmox.qemu.*.vmid
    action: drop
    name: "dropped"
  # VM block device
  - match: 'proxmox\.qemu\.([0-9]+)\.blockstat\.([^\.]+)\.(.*)'
    match_type: regex
    name: 'proxmox_qemu_blockstat_${3}'
    labels:
      vmid: ${1}
      device: ${2}
  # VM NICs
  - match: 'proxmox\.qemu\.([0-9]+)\.nics\.([^\.]+)\.(.*)'
    match_type: regex
    name: 'proxmox_qemu_nics_${3}'
    labels:
      vmid: ${1}
      nic: ${2}
  # VM Support
  - match: 'proxmox\.qemu\.([0-9]+)\.proxmox-support\.(.*)'
    match_type: regex
    name: 'proxmox_qemu_support_${2}'
    labels:
      vmid: ${1}
  # All other VM stats
  - match: 'proxmox\.qemu\.([0-9]+)\.(.*)'
    match_type: regex
    name: 'proxmox_qemu_${2}'
    labels:
      vmid: ${1}

My queries are very simple, so I keep labeling to a bare minimum. I could probably drop the node labels altogether, since I don't cluster.

That's my config.yaml and I deploy it with compose:

YAML:

services:

  graphite_exporter:
    image: prom/graphite-exporter
    container_name: graphite_exporter
    restart: unless-stopped
    networks:
      - external
    ports:
      # /metrics endpoint for debugging
      #- "${PUBLIC_IP}:9108:9108"
      # Graphite receiver
      - "${PUBLIC_IP}:9109:9109/udp"
    environment:
      - "TS=${TIMEZONE}"
    command:
      - --graphite.mapping-config=/etc/prometheus/mapping-config.yaml
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - "${STACK_PATH}/config.yaml:/etc/prometheus/mapping-config.yaml:ro"

networks:
  # Shared with Alloy
  external:
    external: true

I scrape it with Alloy, add a label with the deployment name to all metrics, then push it to local mimir and grafana cloud as its own tenant.

The metrics look something like this, taken from alloy's debug:

JSON:

{__name__="proxmox_storages_total", id="iso-storage", instance="graphite_exporter:9108", job="prometheus.scrape.graphite", node="ms-01"}
{__name__="proxmox_qemu_nics_netout", instance="graphite_exporter:9108", job="prometheus.scrape.graphite", nic="tap102i0", vmid="102"}
{__name__="proxmox_qemu_maxmem", instance="graphite_exporter:9108", job="prometheus.scrape.graphite", vmid="101"}
{__name__="proxmox_qemu_blockstat_flush_operations", device="scsi0", instance="graphite_exporter:9108", job="prometheus.scrape.graphite", vmid="102"}
{__name__="proxmox_nodes_nics_transmit", instance="graphite_exporter:9108", job="prometheus.scrape.graphite", nic="vmbr1v4", node="ms-01"}

Its a little more laborious than using a prebuilt dashboard and influx, or pve-exporter, but if you absolutely want to save on deploying Yet Another Time-Series Database, it seems like the way to go.

bugacha · Feb 24, 2025

bitmover said:
There is a prometheus-exporter ( https://github.com/prometheus-pve/prometheus-pve-exporter ), but this does not feel right.

Could you explain please why it doesnt feel right ?

mcn19 · Feb 25, 2025

bugacha said:
Could you explain please why it doesnt feel right ?

I'm personally not a fan of how it makes a LXC for each exporter, clogs up the UI and resources

Search

Search

[SOLVED] Is Prometheus an optimal option to monitor Proxmox?

PythonTrader

New Member

bitmover

New Member

PythonTrader

New Member

PythonTrader

New Member

deepcloud

Member

Taverius

New Member

bugacha

New Member

mcn19

New Member

We value your privacy