proxmox metricserver uptime as float rather than int

SimonMcNair · Nov 3, 2022

Hi,
I happened to look at my influx logs and noticed the following error occurring:

ts=2022-11-03T15:52:15.241668Z lvl=info msg="Failed to write point batch to database" log_id=0dwTt70G000 service=udp db_instance=udp error="partial write: field type conflict: input field \"uptime\" on measurement \"system\" is type float, already exists as type integer dropped=1"

Please can you check if the metricserver is sending uptime as an integer as I believe it should be ?

Background. Saw the error, tried dropping the measurement in influx. Error still occurred. Stopped metric server on proxmox and error stopped occurring.

I also capture uptime from a number of other places via Telegraf (https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/README.md) , so that probably created the metric as an integer in the first place ?

I'd appreciate it if you can check. This would be a breaking change (for anyone who didn't use telegraf) as the metricserver would have created it as an float for them (if that is what the code does ?).

Cheers
Simon

dcsapak · Nov 7, 2022

i checked here and my uptime is an integer, not float. also checking the code where we get the uptime: https://git.proxmox.com/?p=pve-comm...5012e71e7b8160b204aa;hb=refs/heads/master#l75
i don't see how we could get a float there (we explicitly convert to an integer with 'int()')

what's the output of 'pveversion -v' ?
also are you sure that the stray float value comes from pve ?

SimonMcNair · Nov 7, 2022

Thank you for taking the time to check the code and confirm it isn't proxmox. Considering The only two sources I have writing to the influxdb are proxmox and telegraf I'm confused as to what could have created the metric as a float.

I will investigate further. Thanks again.

SimonMcNair · Nov 7, 2022

When I run telegraf --test /etc/telegraf the output for system is:
> system,host=134e7e15f4ba load1=2.21,load15=1.92,load5=2.31,n_cpus=1i,n_users=0i 1667813275000000000
> system,host=134e7e15f4ba uptime=245239i 1667813275000000000
> system,host=134e7e15f4ba uptime_format="2 days, 20:07" 1667813275000000000

I don't suppose you can tell me if there is any debug I can turn on for metricserver and /or where in the code the uptime is actually sent to the upstream system ?

cheers
Simon

dcsapak · Nov 7, 2022

there is no debug mode sadly, but the code where we gather + send the stats are here:

https://git.proxmox.com/?p=pve-mana...e4980e897dd19dac8d6ed9e0;hb=refs/heads/master
here https://git.proxmox.com/?p=pve-mana...a777623600de3011be6f3509;hb=refs/heads/master
and here https://git.proxmox.com/?p=pve-mana...b645d06607a7c6d33957256b;hb=refs/heads/master

SimonMcNair · Nov 7, 2022

Seems like I'm not the only one. Kinda confused as to why it is happening, maybe the error is a misnomer

https://community.influxdata.com/t/...uts-socket-listener-for-proxmox-metrics/23585

SimonMcNair · Nov 7, 2022

I don't think this is Proxmox or Influx. I think it's telegraf, but I don't know why. I will gather evidence. Thanks for your help.

doymer · Dec 16, 2022

dcsapak said:
there is no debug mode sadly, but the code where we gather + send the stats are here:

https://git.proxmox.com/?p=pve-mana...e4980e897dd19dac8d6ed9e0;hb=refs/heads/master
here https://git.proxmox.com/?p=pve-mana...a777623600de3011be6f3509;hb=refs/heads/master
and here https://git.proxmox.com/?p=pve-mana...b645d06607a7c6d33957256b;hb=refs/heads/master

Sorry to step here without invitation but I have the same issue.

May be in the code there is a casting to int() but the same is happening here so it may be not working as well as expected.

Everything is showing fine and, suddenly, all data disapears. After long time, I am not able to determine, all starts going fine again. And, when the problem occurs, I get the same error as told by the OP: the field is being sent as float whenever it already exists as integer.

To fix that I deleted the measurement and allowed it to start again from zero. It started ok and kept going fine until some point in time where it failed again. The bizarre thing here is that it comes and goes with no human intervention that may explain it; it works for some time, then it fails for another period of time (weeks, not days), then is starts working again. This is getting me mad....

EDIT: The time it varies from working to not working may be related to the duration of a shard. A field type can change its type between shards but not inside the same shard so may be this is the time frame when the work/failure cicles are applied.

EDIT2: I have been having a look at the code where the uptime is returned from and yes, the returned value is casted as 'int' but what it really matters for influxDB is that you must append an integer value with a 'i' as 0i or 23i. Else it can be interpreted as float too. And as far as I can be able to see none of the uptime variable returned by that call is used anywhere in 'build_influxdb_payload' or int 'prepare_value', that should be where this might be done, with an 'i' appended. But I am no expert with the code so I may be missing something.

SpinningRust · Aug 21, 2023

I'm having the same problem and I can't seem to find a solution for this. Any ideas?

pveversion -v

proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
proxmox-kernel-6.2: 6.2.16-8
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-6.2.16-5-pve: 6.2.16-6
pve-kernel-6.2.16-4-pve: 6.2.16-5
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Search

Search

proxmox metricserver uptime as float rather than int

SimonMcNair

Active Member

dcsapak

Proxmox Staff Member

SimonMcNair

Active Member

SimonMcNair

Active Member

dcsapak

Proxmox Staff Member

SimonMcNair

Active Member

SimonMcNair

Active Member

doymer

Member

SpinningRust

Active Member

We value your privacy