Update to Proxmox 6.1 broke metrics

ChristianW

Member
Feb 13, 2020
10
1
8
46
Hi,
we just upgraded Proxmox to 6.1. Everything seems fine so far, but most of our monitoring data will be dropped, maybe because of an mtu issue:

Code:
08:55:30.176550 IP 172.16.200.250.47503 > 172.16.200.6.2003: UDP, bad length 10348 > 1472
08:55:30.242207 IP 172.16.200.250.47871 > 172.16.200.6.2003: UDP, bad length 50789 > 1472
08:55:30.245716 IP 172.16.200.250.47871 > 172.16.200.6.2003: UDP, bad length 50654 > 1472
08:55:30.246359 IP 172.16.200.250.47871 > 172.16.200.6.2003: UDP, bad length 8260 > 1472
08:55:30.580892 IP 172.16.200.250.60298 > 172.16.200.6.2003: UDP, bad length 2793 > 1472

We configured metrics in /etc/pve/status.cfg:
Code:
graphite:
    server eusib-deb-mon
    port 2003
    path proxmox-wir

Before the update:
Bildschirmfoto 2020-02-13 um 09.39.03.png

After update:
Bildschirmfoto 2020-02-13 um 09.39.28.png

On the other hand the Host data itself is still complete (ex. proxmox-wir->nodes->host->cpustat->cpu)
In trouble is only all of the VM data (ex. proxmox-wir->qemu->108->cpu)

Any ideas?
 
Idea: did you check MTU size on PM hosts network interfaces, like what do you see when running:
ip l l
?
 
By the way:
On VMs we use collectd to send metrics and they are all fine. So we have:
- Host metrics by Proxmox = ok
- Metrics out of VMs = ok
- VM metrics by Proxmox = faulty

Besides, how are metrics collected in Proxmox?
 
Did you manage to get to the bottom of this? I'm seeing the exact same thing in my fresh Graphite setup.
 
could you include your 'pveversion -v' output and status.cfg?
 
could you include your 'pveversion -v' output and status.cfg?

Code:
root@NUC10i3FNH-1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-13
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-21
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
root@NUC10i3FNH-1:~#

Reduced the status.cfg as much as possible to see if that helped any ... it did not.

Code:
root@NUC10i3FNH-1:~# cat /etc/pve/status.cfg
graphite:
   server 192.168.0.51
root@NUC10i3FNH-1:~#


Code:
root@NUC10i3FNH-1:~# tcpdump -i vmbr0 udp port 2003
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:43:48.445923 IP NUC10i3FNH-1.mydomain.tld.49300 > 192.168.0.51.2003: UDP, bad length 4308 > 1472
19:43:48.452819 IP NUC10i3FNH-1.mydomain.tld.49691 > 192.168.0.51.2003: UDP, bad length 7860 > 1472
19:43:48.631046 IP NUC10i3FNH-1.mydomain.tld.46142 > 192.168.0.51.2003: UDP, bad length 2352 > 1472
19:43:58.690828 IP NUC10i3FNH-1.mydomain.tld.34262 > 192.168.0.51.2003: UDP, bad length 4308 > 1472
19:43:58.697596 IP NUC10i3FNH-1.mydomain.tld.36868 > 192.168.0.51.2003: UDP, bad length 7859 > 1472
19:43:58.878835 IP NUC10i3FNH-1.mydomain.tld.57816 > 192.168.0.51.2003: UDP, bad length 2352 > 1472
^C
6 packets captured
7 packets received by filter
0 packets dropped by kernel
root@NUC10i3FNH-1:~#


Same thing on the other end -

Code:
me@dockernode-3:~$ sudo tcpdump -i ens18 udp port 2003
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens18, link-type EN10MB (Ethernet), capture size 262144 bytes
19:46:28.524576 IP 192.168.0.12.35243 > dockernode-3.2003: UDP, bad length 4308 > 1472
19:46:28.531371 IP 192.168.0.12.57075 > dockernode-3.2003: UDP, bad length 7860 > 1472
19:46:28.544332 IP 192.168.0.10.50810 > dockernode-3.2003: UDP, bad length 3730 > 1472
19:46:28.549290 IP 192.168.0.10.58556 > dockernode-3.2003: UDP, bad length 3970 > 1472
19:46:28.717289 IP 192.168.0.10.44497 > dockernode-3.2003: UDP, bad length 2352 > 1472
19:46:28.722387 IP 192.168.0.12.35739 > dockernode-3.2003: UDP, bad length 2352 > 1472
19:46:29.103880 IP 192.168.0.11.43742 > dockernode-3.2003: UDP, bad length 3692 > 1472
19:46:29.110846 IP 192.168.0.11.49067 > dockernode-3.2003: UDP, bad length 3996 > 1472
19:46:29.337527 IP 192.168.0.11.32994 > dockernode-3.2003: UDP, bad length 2316 > 1472
^C
9 packets captured
12 packets received by filter
0 packets dropped by kernel
me@dockernode-3:~$
 
Last edited:
Hi,
Did you manage to get to the bottom of this? I'm seeing the exact same thing in my fresh Graphite setup.

switching to tcp is working for us.

It's quite a while ago, but our last guess was, that possibly corosyncs paket size checks could be responsible for the issue.
Our paket sizes for metrics were repetitively starting somewhere at high level and than shrinking until one would reach our metric server. Than it would start all over. But indeed not more than a guess.
 
  • Like
Reactions: Hyacin
pve-manager >= 6.2-3 should show some improvements in this area - it would be great to get feedback once it hits the repositories.
 
  • Like
Reactions: Hyacin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!