Help please: Debugging ping responses from wrong addresses

I am having problems with pings to disconnected machines on my PVE host: Sometimes, instead of resulting in "Destination Host Unreachable", ping responses are recognized from wrong IPs:

Code:
PING octopus.example.com (192.168.1.38) 56(84) bytes of data.
64 bytes from shark.example.com (192.168.5.14): icmp_seq=1 ttl=254 time=16.0 ms (DIFFERENT ADDRESS!)

--- octopus.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.014/16.014/16.014/0.000 ms

If I rerun the ping attempt again immediately thereafter, everything is fine again:

Code:
PING octopus.example.com (192.168.1.38) 56(84) bytes of data.
From zeus.example.com (192.168.1.6) icmp_seq=1 Destination Host Unreachable

--- octopus.example.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

(192.168.1.6 is the PVE host.)

To debug, I'm running a script every minute that runs two separate ping attempts to various machines here.

Observations so far:
  • Problems only occur with offline/disconnected devices - across various machines, no pattern recognizable time- or machine-wise.
  • I cannot reproduce the issue, if I just issue regular pings to a single machine. Only if I try to ping a multitude of machines, the problem occurs.
  • These problems do not occur with pings issued from VMs running on the PVE host. Only pings issues by the host show this behavior.
I've expanded my test script to capture traffic during ping attempts via tcpdump with the following filter:

Code:
tcpdump -v -n -i vmbr0.10  -s 65535 -w "${DUMPNAME}" "icmp or port 53 or arp"

When I look at the resulting dumps with Wireshark, I can see (for the above example)

Code:
No.     Time           Source                Destination           Protocol Length Info
    503 0.898495       192.168.1.6           192.168.1.100         DNS      79     Standard query 0x35cf A octopus.example.com
    504 0.898533       192.168.1.6           192.168.1.100         DNS      79     Standard query 0xb5d1 AAAA octopus.example.com
    505 0.898602       192.168.1.100         192.168.1.6           DNS      95     Standard query response 0x35cf A octopus.example.com A 192.168.1.38
    506 0.898639       192.168.1.100         192.168.1.6           DNS      135    Standard query response 0xb5d1 AAAA octopus.example.com SOA server64.example.com
    876 1.014446       192.168.1.6           192.168.1.100         DNS      79     Standard query 0xd72b A octopus.example.com
    877 1.014486       192.168.1.6           192.168.1.100         DNS      79     Standard query 0x7129 AAAA octopus.example.com
    878 1.014540       192.168.1.100         192.168.1.6           DNS      95     Standard query response 0xd72b A octopus.example.com A 192.168.1.38
    879 1.014582       192.168.1.100         192.168.1.6           DNS      135    Standard query response 0x7129 AAAA octopus.example.com SOA server64.example.com
   1079 1.913178       MellanoxTech_7f:b1:52 Broadcast             ARP      42     Who has 192.168.1.38? Tell 192.168.1.6
   1133 2.937181       MellanoxTech_7f:b1:52 Broadcast             ARP      42     Who has 192.168.1.38? Tell 192.168.1.6

DNS traffic is ok. I cannot find any ARP responses (wrong tcpdump filter?), nor can I see any ping actually going out for 192.168.1.38.

There are 2 pings going out to shark (192.168.5.14), which is ok, because shark was separately pinged as well (in addition to octopus). I did not see more than the two normal pings to shark:

Code:
No.     Time           Source                Destination           Protocol Length Info
    714 0.915575       192.168.1.6           192.168.1.100         DNS      79     Standard query 0xe7bd A shark.example.com
    716 0.915611       192.168.1.6           192.168.1.100         DNS      79     Standard query 0xb7b3 AAAA shark.example.com
    718 0.915663       192.168.1.100         192.168.1.6           DNS      95     Standard query response 0xe7bd A shark.example.com A 192.168.5.14
    721 0.915733       192.168.1.100         192.168.1.6           DNS      135    Standard query response 0xb7b3 AAAA shark.example.com SOA server64.example.com
    724 0.915832       192.168.1.6           192.168.5.14          ICMP     98     Echo (ping) request  id=0x507b, seq=1/256, ttl=64 (reply in 856)
    856 0.986876       192.168.5.14          192.168.1.6           ICMP     98     Echo (ping) reply    id=0x507b, seq=1/256, ttl=254 (request in 724)
    859 0.992419       192.168.1.6           192.168.1.100         DNS      79     Standard query 0xe447 A shark.example.com
    860 0.992473       192.168.1.6           192.168.1.100         DNS      79     Standard query 0x6845 AAAA shark.example.com
    861 0.992539       192.168.1.100         192.168.1.6           DNS      95     Standard query response 0xe447 A shark.example.com A 192.168.5.14
    862 0.992582       192.168.1.100         192.168.1.6           DNS      135    Standard query response 0x6845 AAAA shark.example.com SOA server64.example.com
    863 0.992680       192.168.1.6           192.168.5.14          ICMP     98     Echo (ping) request  id=0xd73b, seq=1/256, ttl=64 (reply in 869)
    869 1.008689       192.168.5.14          192.168.1.6           ICMP     98     Echo (ping) reply    id=0xd73b, seq=1/256, ttl=254 (request in 863)

Why am I not even seeing any ping attempt to 192.168.1.38 in the tcp dump; did I miss to capture something? And how could I find out where ping thinks the response from 192.168.5.14 comes from, when the only response coming in is for a different ping, i.e. that actually going out for 192.168.5.14?

My network config:

/etc/network/interfaces
Code:
# loopback interface
auto lo
iface lo inet loopback

# physical interfaces

iface enp193s0f0np0 inet manual

iface enp193s0f1np1 inet manual

iface enp10s0f0 inet manual

iface enp10s0f1 inet manual

iface enp12s0f3u2u2c2 inet manual

# bond
auto bond0
iface bond0 inet manual
        bond-slaves enp193s0f0np0 enp193s0f1np1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

# bridge for bond, local interface, VMs
auto vmbr0
iface vmbr0 inet static
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 10,30,50,60,70,1000,1001

auto vmbr0.10
iface vmbr0.10 inet static
        address 192.168.1.6/24
        address 192.168.1.101/24
        gateway 192.168.1.254
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!