Node with question mark

just "service pvestatd start" did it for me. but it keeps happening over and over for some reasons
It means you have something that intermittently takes too much time to query for pvestatd daemon (mountpoint, some other info shown in gui). Usually slow disks or unreliable nfs mount.
 
Coisas que ao longo do tempo me fizeram perceber que são elas as verdadeiras causas do pvestatd service ter problemas e precisar ser re-inicializado.

NFS ou sistema de arquivos comum ao cluster não encontrado ou com problemas de latência na conexão ou perda de pacotes entre es NFS
Uso excesivo de memória RAM dos nodes seja esta causada falsamente por VMs sem QEMU-Agent instalado nas VMs seja porque realmente este node tem um uso alto de RAM (Ceph server, I.O com muito delay ou coisas do genero)

Falha na conexão a algum sistema de BACKUP
Por exemplo PBSs ou outros sistemas auxiliares de backup tipo NFS com dificuldade na conexão física latencia ou perda de pacotes. Apesar de que ultimamente não tive mais problemas destes.
 
I'd recommend either opening a new bug in https://bugzilla.proxmox.com , or adding to https://bugzilla.proxmox.com/show_bug.cgi?id=3259.
Although the cause is different in your case, the resulting state is the same.

Ideally, an offline external log/metric collector should not cause cluster heartburn in PVE.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I intentionally added a faulty metrics server to a Proxmox node in my testing cluster.
sure enough the exact same issue came back, I watched the node closely though Netdata (a working metrics server) and this is the errors that seems to be the root cause of the issue:

Alert: apps_group_file_descriptors_utilization​
Chart: app.proxmox-ve_fds_open_limit​
Context: app.fds_open_limit​
Raised to warning, for 0 seconds​
On Sat Apr 27 2024, 01:28:22 CDT​
By: TEST01​
Space: OpenServeonics TESTING​
Rooms: All nodes, Rack 1
Global time: Sat Apr 27 2024, 06:28:22 UTC​
Classification: Utilization​
Role: sysadmin​
 
it se
I intentionally added a faulty metrics server to a Proxmox node in my testing cluster.
sure enough the exact same issue came back, I watched the node closely though Netdata (a working metrics server) and this is the errors that seems to be the root cause of the issue:

Alert: apps_group_file_descriptors_utilization​
Chart: app.proxmox-ve_fds_open_limit​
Context: app.fds_open_limit​
Raised to warning, for 0 seconds​
On Sat Apr 27 2024, 01:28:22 CDT​
By: TEST01​
Space: OpenServeonics TESTING​
Rooms: All nodes, Rack 1
Global time: Sat Apr 27 2024, 06:28:22 UTC​
Classification: Utilization​
Role: sysadmin​
it seems to me like pvestatd tried to open a none existing file descriptor in order to connect to the none existent metrics server.
there is a limit on how many files it can open set for pvestatd, probably to prevent it using too much resource or even crashing a node, since the file it tried to open is nonexistent, it retired over and over, triggering the limit.
And thus, pvestatd was killed by the system.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!