Getting client timeouts in PDM system log

deebsr · Jun 17, 2026

In the system log I'm seeing the following entries for all the nodes in one of my clusters ( x25 nodes )

marking client node01 as unreachable
client timed out on request /api2/extjs/cluster/metrics/export?history=1&local%2Donly=0&start%2Dtime=0, trying another remote

I have another cluster that seems to not having this issue.
I have tried removing and re-adding which dos not change anything.
I notice that on the affected cluster I'm unable to see any metric data in PDM. On the cluster itself its ok.

Running version 1.1.4 PDM
PVE = 9.2.3

sterzy · Jun 17, 2026

deebsr said:
I notice that on the affected cluster I'm unable to see any metric data in PDM. On the cluster itself its ok.

This sounds like the request takes a very long time and eventually fails. What does the following return:

Code:

time pvesh get /cluster/metrics/export --history 1 --local-only 0 --start-time 0 > /dev/null

on a node of the affected cluster?

deebsr · Jun 17, 2026

sterzy said:
This sounds like the request takes a very long time and eventually fails. What does the following return:

Code:

time pvesh get /cluster/metrics/export --history 1 --local-only 0 --start-time 0 > /dev/null

on a node of the affected cluster?

Here is the output on the cluster that is affected ( took a while to actually bring up these results btw ):

real 2m43.476s
user 0m21.497s
sys 0m14.301s

On a similar cluster ( 10 nodes instead of 25 and only about 100VMs vs 2700Vms ) using the same network ( same firewall access, same switches etc ) I get the following:

real 0m6.748s
user 0m2.281s
sys 0m0.811s

deebsr · Jun 17, 2026

Also just for context the reason I started investigating this was that I realized there was zero metrics being brought up into PDM for this cluster.
This is for both Node and VM metrics.

I'm still able to get metrics via an InfluxDB no problems....

sterzy · Jun 18, 2026

Yeah, it sounds like the metrics collection task is struggling with the initial import here. There is just so much data on that remote, that the request takes almost three minutes to complete. That runs into a timeout on the PDM side. So yeah, we'll need to increase the timeout here or make the mechanism more flexible on bigger data sets/slower remotes. We'll look into it.

Thanks for the information!

deebsr · Jun 18, 2026

sterzy said:
Yeah, it sounds like the metrics collection task is struggling with the initial import here. There is just so much data on that remote, that the request takes almost three minutes to complete. That runs into a timeout on the PDM side. So yeah, we'll need to increase the timeout here or make the mechanism more flexible on bigger data sets/slower remotes. We'll look into it.

Thanks for the information!

OK thanks for looking into this. Let me know if you need any other information from this cluster.

As a note this one cluster is 25 nodes and has about 2800VMs in it. Sending metrics using the cluster setting ( sending to an InfluxDB ) seems to be working ok with no issues.

Also should I be setting up a bug report for this? or are you going to add this internally?

sterzy · Jun 19, 2026

deebsr said:
Also should I be setting up a bug report for this? or are you going to add this internally?

Feel free to open a bug report for this, it might be sensible and give other user a heads-up that we are already tracking this. However, I did already add this to our internal issue tracking as a point for improvement. So hopefully it won't get lost either way.

deebsr said:
As a note this one cluster is 25 nodes and has about 2800VMs in it. Sending metrics using the cluster setting ( sending to an InfluxDB ) seems to be working ok with no issues.

To my knowledge InfluxDB uses an entirely different code path here. So yes, it makes sense that even if PDM is struggling InfluxDB can still work in this scenario.

deebsr · Jun 19, 2026

sterzy said:
Feel free to open a bug report for this, it might be sensible and give other user a heads-up that we are already tracking this. However, I did already add this to our internal issue tracking as a point for improvement. So hopefully it won't get lost either way.

To my knowledge InfluxDB uses an entirely different code path here. So yes, it makes sense that even if PDM is struggling InfluxDB can still work in this scenario.

Ok bug report filed:

https://bugzilla.proxmox.com/show_bug.cgi?id=7731

Please add any other relevent infomation

Hopefully this can be fixed soon. I feel PDM has great potential to be used at some point as a centralized log and metrics location similar to how vCenter is currently.
Even better if PDM could also gain functionality similar to Aria Operations/Logs.

sterzy · Jun 22, 2026

Thanks for creating that, I added the Bugzilla issue to our internal tooling as well, so updates should follow there once available!

deebsr · Monday at 19:41

Note, I updated to the version 1.1.6 now but the issue is still there.

Getting client timeouts in PDM system log

deebsr

Member

sterzy

Proxmox Staff Member

deebsr

Member

deebsr

Member

sterzy

Proxmox Staff Member

deebsr

Member

sterzy

Proxmox Staff Member

deebsr

Member

sterzy

Proxmox Staff Member

deebsr

Member

We value your privacy