Adding influxdb to status.cfg gives bad udp cksum

bobnick

New Member
Sep 7, 2018
5
0
1
Hi,

I have tried to add my influxdb server to /etc/pve/status.cfg (following this guide: https://pve.proxmox.com/wiki/External_Metric_Server). When I execute the command
tcpdump udp -i vmbr0 -vv port 8089 it gives "bad udp cksum" (see below).

My Influxdb server is a virtual machine on my Proxmox server 1 (see package versions below). I have tried to change the virtual network card in my Influxdb server from "virtio" to "e1000" and the bridge from vmbr0 (VLAN tagget network) to vmbr2 (not VLAN tagget) without luck.

I have also tried to add the influxdb server to another Proxmox instance (server 2) (version 3.5-8 and afterward updated to the same version as Proxmox server 1).

/etc/pve/status.cfg file:
Code:
influxdb:
    server 192.168.2.170
    port 8089

/etc/influxdb/influxdb.conf file on my influxdb server:
Code:
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  series-id-set-cache-size = 100

[[udp]]
  enabled = true
  bind-address = "0.0.0.0:8089"
  database = "proxmox"
  batch-size = 1000
  batch-timeout = "1s"

bad udp cksum error:
Code:
20:05:05.767979 IP (tos 0x0, ttl 64, id 45493, offset 0, flags [+], proto UDP (17), length 1500)
    virt.local.51910 > 192.168.2.170.8089: UDP, bad length 1797 > 1472
20:05:05.768346 IP (tos 0x0, ttl 64, id 45494, offset 0, flags [+], proto UDP (17), length 1500)
    virt.local.47499 > 192.168.2.170.8089: UDP, bad length 1787 > 1472
20:05:05.768783 IP (tos 0x0, ttl 64, id 45495, offset 0, flags [+], proto UDP (17), length 1500)
    virt.local.60508 > 192.168.2.170.8089: UDP, bad length 1801 > 1472
20:05:05.769318 IP (tos 0x0, ttl 64, id 45496, offset 0, flags [+], proto UDP (17), length 1500)
    virt.local.55032 > 192.168.2.170.8089: UDP, bad length 1792 > 1472
20:05:05.769630 IP (tos 0x0, ttl 64, id 45497, offset 0, flags [DF], proto UDP (17), length 346)
    virt.local.36829 > 192.168.2.170.8089: [bad udp cksum 0x8758 -> 0x9397!] UDP, length 318
20:05:05.770104 IP (tos 0x0, ttl 64, id 45498, offset 0, flags [+], proto UDP (17), length 1500)
    virt.local.44395 > 192.168.2.170.8089: UDP, bad length 1763 > 1472
20:05:05.770350 IP (tos 0x0, ttl 64, id 45499, offset 0, flags [DF], proto UDP (17), length 315)
    virt.local.48145 > 192.168.2.170.8089: [bad udp cksum 0x8739 -> 0xe750!] UDP, length 287
20:05:05.815486 IP (tos 0x0, ttl 64, id 45510, offset 0, flags [DF], proto UDP (17), length 225)
    virt.local.40281 > 192.168.2.170.8089: [bad udp cksum 0x86df -> 0x7a0b!] UDP, length 197
20:05:05.815821 IP (tos 0x0, ttl 64, id 45511, offset 0, flags [DF], proto UDP (17), length 221)
    virt.local.56102 > 192.168.2.170.8089: [bad udp cksum 0x86db -> 0xf692!] UDP, length 193
20:05:05.816081 IP (tos 0x0, ttl 64, id 45512, offset 0, flags [DF], proto UDP (17), length 235)
    virt.local.49429 > 192.168.2.170.8089: [bad udp cksum 0x86e9 -> 0x9292!] UDP, length 207

Package versions (Proxmox server 1):
Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-4-pve: 4.15.18-23
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-46
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

/etc/network/interfaces file from my Proxmox server 1:
Code:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug enp2s0
iface enp2s0 inet manual

#allow-hotplug enp1s0
iface enp3s0 inet manual

#allow-hotplug eno1
iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
   address 192.168.2.6
   netmask 255.255.255.0
   gateway 192.168.2.1
   bridge_ports enp3s0
   bridge_stp off
   bridge_fd 0
   bridge_maxwait 10

auto vmbr1
iface vmbr1 inet static
   address 192.168.4.6
   netmask 255.255.255.0
   bridge_ports enp2s0
   bridge_stp off
   bridge_fd 0
   bridge_maxwait 10

auto vmbr2
iface vmbr2 inet static
   address 192.168.6.6
   netmask 255.255.255.0
   bridge_ports eno1
   bridge_stp off
   bridge_fd 0
   bridge_maxwait 10

auto vmbr10
iface vmbr10 inet static
        address 192.168.10.6
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0
        
auto vmbr20
iface vmbr20 inet static
        address 192.168.20.6
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0
        
auto vmbr30
iface vmbr30 inet static
        address 192.168.30.6
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0
        
auto vmbr40
iface vmbr40 inet static
        address 192.168.40.6
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0

Thank you in advance.
 
* hm - does the influxdb collection work ?

AFAIR those bad cksum errors (for packets from the local interface) are quite normal - since if you've enabled offloading the kernel does not have the checksum at the time when tcpdump 'sees' the packet.

Do you get the checksums also when sniffing on the other side?
 
* hm - does the influxdb collection work ?
Yes, I have other Influx databases (non UDP collection) where data is collected correct and used in Grafana (same server).

Do you get the checksums also when sniffing on the other side?
Yes, I get the same "bad udp cksum" on the other side.

Code:
06:23:18.272800 IP (tos 0x0, ttl 64, id 57639, offset 0, flags [DF], proto UDP (17), length 325)
    virt.39775 > 192.168.2.170.8089: [bad udp cksum 0x8743 -> 0xea2d!] UDP, length 297
06:23:18.273135 IP (tos 0x0, ttl 64, id 57640, offset 0, flags [+], proto UDP (17), length 1500)
    virt.54539 > 192.168.2.170.8089: UDP, bad length 1801 > 1472
06:23:18.273681 IP (tos 0x0, ttl 64, id 57641, offset 0, flags [DF], proto UDP (17), length 323)
    virt.39533 > 192.168.2.170.8089: [bad udp cksum 0x8741 -> 0x23da!] UDP, length 295
06:23:18.273692 IP (tos 0x0, ttl 64, id 57642, offset 0, flags [DF], proto UDP (17), length 346)
    virt.53643 > 192.168.2.170.8089: [bad udp cksum 0x8758 -> 0x44e0!] UDP, length 318
06:23:18.276657 IP (tos 0x0, ttl 64, id 57643, offset 0, flags [DF], proto UDP (17), length 296)
    virt.56386 > 192.168.2.170.8089: [bad udp cksum 0x8726 -> 0xf120!] UDP, length 268
06:23:18.306230 IP (tos 0x0, ttl 64, id 57648, offset 0, flags [DF], proto UDP (17), length 235)
    virt.48624 > 192.168.2.170.8089: [bad udp cksum 0x86e9 -> 0x8e96!] UDP, length 207
06:23:18.306653 IP (tos 0x0, ttl 64, id 57649, offset 0, flags [DF], proto UDP (17), length 221)
    virt.35823 > 192.168.2.170.8089: [bad udp cksum 0x86db -> 0x03f6!] UDP, length 193
06:23:18.307023 IP (tos 0x0, ttl 64, id 57650, offset 0, flags [DF], proto UDP (17), length 225)
    virt.49658 > 192.168.2.170.8089: [bad udp cksum 0x86df -> 0x364a!] UDP, length 197
 
Yes, I have other Influx databases (non UDP collection) where data is collected correct and used in Grafana (same server).
Sorry - didn't phrase that well - my question was whether the data arrives in this influxdb - not whether influxdb works in general - I wanted to rule out that everything works apart from the bad cksum messages in tcpdump.

bad length 1797 > 1472
This could also be a hint - what's the MTU of the interfaces (bridge, physical interface where the packets go out, interface in the guest)?
(ip link should provide the information)
 
Sorry - didn't phrase that well - my question was whether the data arrives in this influxdb - not whether influxdb works in general - I wanted to rule out that everything works apart from the bad cksum messages in tcpdump.

In the InfluxDB storage directory (/var/lib/influx/data) there is no directory for the influx database for Proxmox. The other Influxdb databases are present in var/lib/influx/data .

This could also be a hint - what's the MTU of the interfaces (bridge, physical interface where the packets go out, interface in the guest)?
(ip link should provide the information)

The MTU value for both network interfaces for the Proxmox server and InfluxDB server is 1500.
 
Did you ever get to the bottom of this? I'm having the exact same issue with my new Graphite setup.
 
I'm having the same issue as well.

10:20:25.382568 IP (tos 0x0, ttl 64, id 467, offset 0, flags [+], proto UDP (17), length 1500)
homer.mynet.59390 > 192.168.1.10.8089: UDP, bad length 1649 > 1472
 
Having the exact same problem. Proxmox logs below error message in Syslog.

Jan 24 20:17:10 proxmox pvestatd[1105]: qemu status update error: metrics send error 'proxmox': failed to send metrics: Connection refused Jan 24 20:31:20 proxmox pvestatd[1105]: qemu status update error: metrics send error 'proxmox': failed to send metrics: Connection refused

where InfluxDB tcp dump shows bad udp checksum error.

20:30:20.405569 IP (tos 0x0, ttl 64, id 33606, offset 0, flags [DF], proto UDP (17), length 1443) proxmox.54076 > influxdb.8089: [bad udp cksum 0x89bd -> 0xa7fe!] UDP, length 1415
 
Last edited:
@Stoiko Ivanov thanks for your reply, here's what I did on the node;

[centos@worker ~]$ sudo firewall-cmd --add-port=8089/udp --permanent success [centos@worker ~]$ sudo firewall-cmd --reload success [centos@worker ~]$ sudo tcpdump udp -i ens18 -vv port 8089 dropped privs to tcpdump tcpdump: listening on ens18, link-type EN10MB (Ethernet), capture size 262144 bytes 17:40:02.940832 IP (tos 0x0, ttl 64, id 54298, offset 0, flags [DF], proto UDP (17), length 1456) proxmox.mydomain.54286 > worker.mydomain.8089: [bad udp cksum 0x89ca -> 0xa24b!] UDP, length 1428 17:40:02.964802 IP (tos 0x0, ttl 64, id 54302, offset 0, flags [DF], proto UDP (17), length 1217) proxmox.mydomain.60796 > worker.mydomain.8089: [bad udp cksum 0x88db -> 0x2872!] UDP, length 1189 17:40:03.347585 IP (tos 0x0, ttl 64, id 54333, offset 0, flags [DF], proto UDP (17), length 663) proxmox.mydomain.34410 > worker.mydomain.8089: [bad udp cksum 0x86b1 -> 0x85a0!] UDP, length 635 17:40:13.363269 IP (tos 0x0, ttl 64, id 54597, offset 0, flags [DF], proto UDP (17), length 1456) proxmox.mydomain.51819 > worker.mydomain.8089: [bad udp cksum 0x89ca -> 0x0ad1!] UDP, length 1428 17:40:13.363377 IP (tos 0x0, ttl 64, id 54598, offset 0, flags [DF], proto UDP (17), length 1438) proxmox.mydomain.51819 > worker.mydomain.8089: [bad udp cksum 0x89b8 -> 0x3b5f!] UDP, length 1410 17:40:13.382388 IP (tos 0x0, ttl 64, id 54599, offset 0, flags [DF], proto UDP (17), length 1217) proxmox.mydomain.56497 > worker.mydomain.8089: [bad udp cksum 0x88db -> 0x3f2c!] UDP, length 1189 17:40:13.780559 IP (tos 0x0, ttl 64, id 54658, offset 0, flags [DF], proto UDP (17), length 663) proxmox.mydomain.38863 > worker.mydomain.8089: [bad udp cksum 0x86b1 -> 0x3196!] UDP, length 635 17:40:22.796543 IP (tos 0x0, ttl 64, id 55386, offset 0, flags [DF], proto UDP (17), length 1456) proxmox.mydomain.55769 > worker.mydomain.8089: [bad udp cksum 0x89ca -> 0xb835!] UDP, length 1428 17:40:22.796671 IP (tos 0x0, ttl 64, id 55387, offset 0, flags [DF], proto UDP (17), length 1438) proxmox.mydomain.55769 > worker.mydomain.8089: [bad udp cksum 0x89b8 -> 0x00cc!] UDP, length 1410 17:40:22.816672 IP (tos 0x0, ttl 64, id 55388, offset 0, flags [DF], proto UDP (17), length 1222) proxmox.mydomain.47042 > worker.mydomain.8089: [bad udp cksum 0x88e0 -> 0xa9b1!] UDP, length 1194 17:40:23.259351 IP (tos 0x0, ttl 64, id 55408, offset 0, flags [DF], proto UDP (17), length 663) proxmox.mydomain.40299 > worker.mydomain.8089: [bad udp cksum 0x86b1 -> 0xf917!] UDP, length 635 ^C 11 packets captured 11 packets received by filter 0 packets dropped by kernel

After opening port on node firewall, I started seeing same messages `Jan 25 17:46:03 proxmox pvestatd[1108]: qemu status update error: metrics send error 'proxmox': failed to send metrics: Connection refused`.
Last I disabled the firewall on the node. Same messages are coming thru.
I checked MTU, it's 1500 for both proxmox and node...

I'm not sure what I'm missing here...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!