I intentionally added a faulty metrics server to a Proxmox node in my testing cluster.
sure enough the exact same issue came back, I watched the node closely though Netdata (a working metrics server) and this is the errors that seems to be the root cause of the issue:
Alert: apps_group_file_descriptors_utilization |
Chart: app.proxmox-ve_fds_open_limit |
Context: app.fds_open_limit |
Raised to warning, for 0 seconds |
|
On Sat Apr 27 2024, 01:28:22 CDT |
By: TEST01 |
Space: OpenServeonics TESTING |
|
Global time: Sat Apr 27 2024, 06:28:22 UTC |
|
Classification: Utilization |
Role: sysadmin |