failed to collect metrics for xxx api error (status = 596: )

Jun 20, 2023
60
2
8
Hi,

I test PDM, the cluster connected but there's error failed to collect metrics for xxx: api error (status = 596: ) and graph was not show.
The cluster has set external metric server use grafana, is there any relation with my error log?

Thanks
 
I got the same error, when a cron job executes a bash script that tries to curl an API endpoint with a lot of data. I don't get any errors when manually running the script.

What's interesting is that the connection always gets terminated after 30 seconds. I didn't help that I put the timout-time of curl to 15 minutes. This must be caused by the Proxmox API Endpoint/Webserver.

Code:
[11/02/2025:05:30:32 +0100] "GET /api2/json/nodes/redacted/storage/redacted/content HTTP/1.1" 596 -


^M  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:21 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:22 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:23 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:24 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:25 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:26 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:27 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:--  0:00:29 --:--:--     0{ [5 bytes data]

< HTTP/1.1 596 Connection timed out

< Cache-Control: max-age=0

< Connection: close

< Date: Tue, 11 Feb 2025 04:30:32 GMT

< Pragma: no-cache

< Server: pve-api-daemon/3.0

< Expires: Tue, 11 Feb 2025 04:30:32 GMT
 
Last edited:
I test PDM, the cluster connected but there's error failed to collect metrics for xxx: api error (status = 596: ) and graph was not show.
Could you connect to your PVE cluster via SSH and check if pvesh get /cluster/metrics/export --history shows any error?

The cluster has set external metric server use grafana, is there any relation with my error log?
This should not have anything to do with it.
 
there's no directory /cluster/metrics/export, only /cluster/metrics/server, when do command pvesh get

What is your pveversion --verbose ? Proxmox Datacenter Manager requires an up-to-date Proxmox VE installation for the time being. The API endpoint for the metric data has only been added in pve-manager 8.2.5
 
root@pve02:~# pveversion --verbose
proxmox-ve: 8.3.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-12
proxmox-kernel-6.8: 6.8.12-4
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
pve-kernel-5.15.149-1-pve: 5.15.149-1
pve-kernel-5.15.143-1-pve: 5.15.143-1
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.7-pve3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.3
pve-cluster: 8.0.10
pve-container: 5.2.2
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-1
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.3
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
root@pve02:~#
 
Are all cluster nodes at the same version of pve-manager?
 
Feb 19 13:02:22 pve03 pvedaemon[384539]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:02:32 pve03 pvestatd[1720]: status update time (16.311 seconds)
Feb 19 13:02:35 pve03 pvedaemon[384539]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:02:47 pve03 pvestatd[1720]: status update time (15.396 seconds)
Feb 19 13:03:03 pve03 pvestatd[1720]: status update time (15.334 seconds)
Feb 19 13:03:11 pve03 pvedaemon[621732]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:03:16 pve03 pvedaemon[621732]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:03:19 pve03 pvestatd[1720]: status update time (16.116 seconds)
Feb 19 13:03:21 pve03 pvedaemon[621732]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:03:27 pve03 kernel: libceph: osd2 (1)10.121.21.13:6803 socket closed (con state OPEN)
Feb 19 13:03:35 pve03 pvestatd[1720]: status update time (16.128 seconds)
Feb 19 13:03:35 pve03 pvedaemon[621732]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:03:51 pve03 pvestatd[1720]: status update time (15.652 seconds)
Feb 19 13:04:06 pve03 pvestatd[1720]: status update time (15.255 seconds)
Feb 19 13:04:11 pve03 pvedaemon[529154]: could not fetch metrics from pve08: 500 read timeout
Feb 19 13:04:21 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:04:22 pve03 pvestatd[1720]: status update time (16.296 seconds)
Feb 19 13:04:31 pve03 pvedaemon[529154]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:04:36 pve03 pvedaemon[529154]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:04:38 pve03 pvestatd[1720]: status update time (16.210 seconds)
Feb 19 13:04:41 pve03 pvedaemon[529154]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:04:52 pve03 kernel: libceph: osd2 (1)10.121.21.13:6803 socket closed (con state OPEN)
Feb 19 13:04:52 pve03 ceph-osd[137722]: 2025-02-19T13:04:52.191+0700 78d9b06006c0 -1 reset not still connected to 0x5cc614c60680
Feb 19 13:04:54 pve03 pvestatd[1720]: status update time (15.414 seconds)
Feb 19 13:05:09 pve03 pvestatd[1720]: status update time (15.034 seconds)
Feb 19 13:05:20 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:05:24 pve03 pvestatd[1720]: status update time (15.263 seconds)
Feb 19 13:05:29 pve03 pvedaemon[529154]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:05:35 pve03 pvedaemon[529154]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:05:39 pve03 pvestatd[1720]: status update time (15.391 seconds)
Feb 19 13:05:40 pve03 pvedaemon[529154]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:05:55 pve03 pvestatd[1720]: status update time (15.823 seconds)
Feb 19 13:06:11 pve03 pvestatd[1720]: status update time (15.371 seconds)
Feb 19 13:06:20 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:06:26 pve03 pvestatd[1720]: status update time (15.819 seconds)
Feb 19 13:06:29 pve03 pvedaemon[529154]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:06:34 pve03 pvedaemon[529154]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:06:39 pve03 pvedaemon[529154]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:06:42 pve03 pvestatd[1720]: status update time (15.642 seconds)
Feb 19 13:06:59 pve03 pvestatd[1720]: status update time (16.586 seconds)
Feb 19 13:07:11 pve03 pvedaemon[384539]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:07:14 pve03 pvestatd[1720]: status update time (15.689 seconds)
Feb 19 13:07:16 pve03 pvedaemon[384539]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:07:22 pve03 pvedaemon[384539]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:07:27 pve03 pvedaemon[384539]: could not fetch metrics from pve06: 500 read timeout
Feb 19 13:07:31 pve03 pvestatd[1720]: status update time (16.340 seconds)
Feb 19 13:07:37 pve03 pvedaemon[384539]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:07:46 pve03 pvestatd[1720]: status update time (15.378 seconds)
Feb 19 13:08:02 pve03 pvestatd[1720]: status update time (15.989 seconds)
Feb 19 13:08:11 pve03 pvedaemon[621732]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:08:16 pve03 pvedaemon[621732]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:08:18 pve03 pvestatd[1720]: status update time (15.739 seconds)
Feb 19 13:08:22 pve03 pvedaemon[621732]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:08:34 pve03 pvestatd[1720]: status update time (16.092 seconds)
Feb 19 13:08:35 pve03 pvedaemon[621732]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:08:49 pve03 pvestatd[1720]: status update time (15.442 seconds)
Feb 19 13:09:04 pve03 pvestatd[1720]: status update time (14.978 seconds)
Feb 19 13:09:11 pve03 pvedaemon[621732]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:09:20 pve03 pvestatd[1720]: status update time (15.743 seconds)
Feb 19 13:09:21 pve03 pvedaemon[621732]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:09:35 pve03 pvedaemon[621732]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:09:36 pve03 pvestatd[1720]: status update time (15.896 seconds)
Feb 19 13:09:52 pve03 pvestatd[1720]: status update time (15.741 seconds)
Feb 19 13:10:07 pve03 pvestatd[1720]: status update time (15.126 seconds)
Feb 19 13:10:20 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:10:23 pve03 pvestatd[1720]: status update time (15.749 seconds)
Feb 19 13:10:31 pve03 pvedaemon[529154]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:10:36 pve03 pvedaemon[529154]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:10:38 pve03 pvestatd[1720]: status update time (15.699 seconds)
Feb 19 13:10:41 pve03 pvedaemon[529154]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:10:54 pve03 pvestatd[1720]: status update time (15.842 seconds)
Feb 19 13:11:11 pve03 pvestatd[1720]: status update time (16.571 seconds)
Feb 19 13:11:11 pve03 pvedaemon[621732]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:11:16 pve03 pvedaemon[621732]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:11:21 pve03 pvedaemon[621732]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:11:26 pve03 pvestatd[1720]: status update time (15.527 seconds)
Feb 19 13:11:36 pve03 pvedaemon[621732]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:11:41 pve03 pvestatd[1720]: status update time (15.034 seconds)
Feb 19 13:11:57 pve03 pvestatd[1720]: status update time (16.181 seconds)
Feb 19 13:12:11 pve03 pvedaemon[384539]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:12:14 pve03 pvestatd[1720]: status update time (16.165 seconds)
Feb 19 13:12:17 pve03 pvedaemon[384539]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:12:22 pve03 kernel: libceph: osd2 (1)10.121.21.13:6803 socket closed (con state OPEN)
Feb 19 13:12:22 pve03 pvedaemon[384539]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:12:27 pve03 pvedaemon[384539]: could not fetch metrics from pve06: 500 read timeout
Feb 19 13:12:30 pve03 pvestatd[1720]: status update time (15.958 seconds)
Feb 19 13:12:32 pve03 pvedaemon[384539]: could not fetch metrics from pve07: 500 read timeout
Feb 19 13:12:37 pve03 pvedaemon[384539]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:12:46 pve03 pvestatd[1720]: status update time (16.240 seconds)
Feb 19 13:13:02 pve03 pvestatd[1720]: status update time (16.116 seconds)
Feb 19 13:13:11 pve03 pvedaemon[621732]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:13:16 pve03 pvedaemon[621732]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:13:18 pve03 pvestatd[1720]: status update time (16.113 seconds)
Feb 19 13:13:21 pve03 pvedaemon[621732]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:13:32 pve03 pvedaemon[621732]: could not fetch metrics from pve07: 500 read timeout
Feb 19 13:13:34 pve03 pvestatd[1720]: status update time (16.303 seconds)
Feb 19 13:13:37 pve03 pvedaemon[621732]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:13:51 pve03 pvestatd[1720]: status update time (16.161 seconds)
Feb 19 13:14:06 pve03 pvestatd[1720]: status update time (15.018 seconds)
Feb 19 13:14:12 pve03 pvedaemon[384539]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:14:17 pve03 pvedaemon[384539]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:14:21 pve03 pvestatd[1720]: status update time (15.579 seconds)
Feb 19 13:14:22 pve03 pvedaemon[384539]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:14:27 pve03 kernel: libceph: osd2 (1)10.121.21.13:6803 socket closed (con state OPEN)
Feb 19 13:14:27 pve03 pvedaemon[384539]: could not fetch metrics from pve06: 500 read timeout
Feb 19 13:14:32 pve03 pvedaemon[384539]: could not fetch metrics from pve07: 500 read timeout
Feb 19 13:14:37 pve03 pvestatd[1720]: status update time (16.101 seconds)
Feb 19 13:14:38 pve03 pvedaemon[384539]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:14:53 pve03 pvestatd[1720]: status update time (16.177 seconds)
Feb 19 13:15:09 pve03 pvestatd[1720]: status update time (15.082 seconds)
Feb 19 13:15:20 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout
Feb 19 13:15:24 pve03 pvestatd[1720]: status update time (15.906 seconds)
Feb 19 13:15:31 pve03 pvedaemon[529154]: could not fetch metrics from pve02: 500 read timeout
Feb 19 13:15:36 pve03 pvedaemon[529154]: could not fetch metrics from pve05: 500 read timeout
Feb 19 13:15:41 pve03 pvestatd[1720]: status update time (16.186 seconds)
Feb 19 13:15:41 pve03 pvedaemon[529154]: could not fetch metrics from pve01: 500 read timeout
Feb 19 13:15:49 pve03 ceph-osd[137722]: 2025-02-19T13:15:49.238+0700 78d9b06006c0 -1 reset not still connected to 0x5cc5b8ac0d00
Feb 19 13:15:49 pve03 ceph-osd[137722]: 2025-02-19T13:15:49.238+0700 78d9b06006c0 -1 reset not still connected to 0x5cc60ca2b930
Feb 19 13:15:57 pve03 pvestatd[1720]: status update time (15.937 seconds)
Feb 19 13:16:12 pve03 pvestatd[1720]: status update time (15.235 seconds)
Feb 19 13:16:21 pve03 pvedaemon[529154]: could not fetch metrics from pve04: 500 read timeout





After check on pve cluster, there's error log like above, already try to change to other node to login on pdm, the error log appear on the node that i register to remote pdm. The cluster was running well.

root@pve03:~# pvecm status
Cluster information
-------------------
Name: cbncloudstack
Config Version: 9
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 19 14:03:48 2025
Quorum provider: corosync_votequorum
Nodes: 8
Node ID: 0x00000003
Ring ID: 1.189
Quorate: Yes

Votequorum information
----------------------
Expected votes: 8
Highest expected: 8
Total votes: 8
Quorum: 5
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.121.0.11
0x00000002 1 10.121.0.12
0x00000003 1 10.121.0.13 (local)
0x00000004 1 10.121.0.10
0x00000005 1 10.121.0.9
0x00000006 1 10.121.0.16
0x00000007 1 10.121.0.17
0x00000008 1 10.121.0.18
root@pve03:~#
 
Are all cluster nodes at the same version of pve-manager?

root@pve01:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-4-pve)

root@pve02:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.4-3-pve)

root@pve03:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.4-3-pve)

root@pve04:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.4-3-pve)

root@pve05:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-2-pve)

root@pve06:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-2-pve)

root@pve07:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-4-pve)

root@pve08:~# pveversion
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.4-2-pve)


the pve-manager version was same, but kernel different