After configure metric server, VMs show status unknown.

Joao Correa

Member
Nov 20, 2017
21
10
8
40
This problem occurs only when the metrics server is unavailable.

Seleção_519.png

PVE Version: 6.3

Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.10.17-2-pve: 4.10.17-20
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

File: /etc/pve/status.cfg
Code:
graphite: MetricasGraphite
    server 192.168.100.236
    port 2003
    proto tcp
 
Hi!
Even restarting services, the problem remains.
I can resolve it when I disable the metric server, but to disable it through the GUI, it must be accessible.
Should I open a BugReport on this? Anyone else with the same problem?
 
is your graphite really accessible via tcp? there should be a default timeout of 1 second, maybe you need to adapt that (set the 'timeout' parameter)
 
is your graphite really accessible via tcp? there should be a default timeout of 1 second, maybe you need to adapt that (set the 'timeout' parameter)
This problem occurs only when the graphite is unavailable. When it is accessible, the problem does not occur.

From my point of view, the metrics server is not accessible, it should not cause a problem in the visualization of the status of virtual machines through the Proxmox interface.
 
From my point of view, the metrics server is not accessible, it should not cause a problem in the visualization of the status of virtual machines through the Proxmox interface.
yes, sadly the current architecture of the daemon responsible for gathering status information and sending it to external metric servers (pvestatd) makes it likely that something
like this happens (also e.g. when you have a hanging nfs server). as i said please try to set the timeout to a sensible value, and see if that makes any difference.
 
yes, sadly the current architecture of the daemon responsible for gathering status information and sending it to external metric servers (pvestatd) makes it likely that something
like this happens (also e.g. when you have a hanging nfs server). as i said please try to set the timeout to a sensible value, and see if that makes any difference.

Timeout set for 60 seconds:

- The server view 'tree' is not updated, but the status: unknown error does not appear
- After some time, the same problem occurs

When having to disable the metric server, the following error is displayed:

Seleção_523.png
 
thats the case. we like do add a external InfluxDB Server. But we want not to "crash" the cluster if the influxDB Server is down (f.e. by maintenaince or other errors).
 
the cluster will not "crash" because of a not working metric server, but the gui may not have all information and show '?' icons
 
the cluster will not "crash" because of a not working metric server, but the gui may not have all information and show '?' icons
Ok thanks. This "?-issue" still exists. ok now we know. Thanks a lot.
 
yes and i guess this will not change in the near future... if your metrics server is not reachable/slow/faulty the process that collects and redistributes it is disrupted...
so make sure that your metric server is online and reachable
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!