I am having problems with pings to disconnected machines on my PVE host: Sometimes, instead of resulting in "Destination Host Unreachable", ping responses are recognized from wrong IPs:
If I rerun the ping attempt again immediately thereafter, everything is fine again:
(192.168.1.6 is the PVE host.)
To debug, I'm running a script every minute that runs two separate ping attempts to various machines here.
Observations so far:
When I look at the resulting dumps with Wireshark, I can see (for the above example)
DNS traffic is ok. I cannot find any ARP responses (wrong tcpdump filter?), nor can I see any ping actually going out for 192.168.1.38.
There are 2 pings going out to shark (192.168.5.14), which is ok, because shark was separately pinged as well (in addition to octopus). I did not see more than the two normal pings to shark:
Why am I not even seeing any ping attempt to 192.168.1.38 in the tcp dump; did I miss to capture something? And how could I find out where ping thinks the response from 192.168.5.14 comes from, when the only response coming in is for a different ping, i.e. that actually going out for 192.168.5.14?
My network config:
/etc/network/interfaces
Code:
PING octopus.example.com (192.168.1.38) 56(84) bytes of data.
64 bytes from shark.example.com (192.168.5.14): icmp_seq=1 ttl=254 time=16.0 ms (DIFFERENT ADDRESS!)
--- octopus.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.014/16.014/16.014/0.000 ms
If I rerun the ping attempt again immediately thereafter, everything is fine again:
Code:
PING octopus.example.com (192.168.1.38) 56(84) bytes of data.
From zeus.example.com (192.168.1.6) icmp_seq=1 Destination Host Unreachable
--- octopus.example.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
(192.168.1.6 is the PVE host.)
To debug, I'm running a script every minute that runs two separate ping attempts to various machines here.
Observations so far:
- Problems only occur with offline/disconnected devices - across various machines, no pattern recognizable time- or machine-wise.
- I cannot reproduce the issue, if I just issue regular pings to a single machine. Only if I try to ping a multitude of machines, the problem occurs.
- These problems do not occur with pings issued from VMs running on the PVE host. Only pings issues by the host show this behavior.
Code:
tcpdump -v -n -i vmbr0.10 -s 65535 -w "${DUMPNAME}" "icmp or port 53 or arp"
When I look at the resulting dumps with Wireshark, I can see (for the above example)
Code:
No. Time Source Destination Protocol Length Info
503 0.898495 192.168.1.6 192.168.1.100 DNS 79 Standard query 0x35cf A octopus.example.com
504 0.898533 192.168.1.6 192.168.1.100 DNS 79 Standard query 0xb5d1 AAAA octopus.example.com
505 0.898602 192.168.1.100 192.168.1.6 DNS 95 Standard query response 0x35cf A octopus.example.com A 192.168.1.38
506 0.898639 192.168.1.100 192.168.1.6 DNS 135 Standard query response 0xb5d1 AAAA octopus.example.com SOA server64.example.com
876 1.014446 192.168.1.6 192.168.1.100 DNS 79 Standard query 0xd72b A octopus.example.com
877 1.014486 192.168.1.6 192.168.1.100 DNS 79 Standard query 0x7129 AAAA octopus.example.com
878 1.014540 192.168.1.100 192.168.1.6 DNS 95 Standard query response 0xd72b A octopus.example.com A 192.168.1.38
879 1.014582 192.168.1.100 192.168.1.6 DNS 135 Standard query response 0x7129 AAAA octopus.example.com SOA server64.example.com
1079 1.913178 MellanoxTech_7f:b1:52 Broadcast ARP 42 Who has 192.168.1.38? Tell 192.168.1.6
1133 2.937181 MellanoxTech_7f:b1:52 Broadcast ARP 42 Who has 192.168.1.38? Tell 192.168.1.6
DNS traffic is ok. I cannot find any ARP responses (wrong tcpdump filter?), nor can I see any ping actually going out for 192.168.1.38.
There are 2 pings going out to shark (192.168.5.14), which is ok, because shark was separately pinged as well (in addition to octopus). I did not see more than the two normal pings to shark:
Code:
No. Time Source Destination Protocol Length Info
714 0.915575 192.168.1.6 192.168.1.100 DNS 79 Standard query 0xe7bd A shark.example.com
716 0.915611 192.168.1.6 192.168.1.100 DNS 79 Standard query 0xb7b3 AAAA shark.example.com
718 0.915663 192.168.1.100 192.168.1.6 DNS 95 Standard query response 0xe7bd A shark.example.com A 192.168.5.14
721 0.915733 192.168.1.100 192.168.1.6 DNS 135 Standard query response 0xb7b3 AAAA shark.example.com SOA server64.example.com
724 0.915832 192.168.1.6 192.168.5.14 ICMP 98 Echo (ping) request id=0x507b, seq=1/256, ttl=64 (reply in 856)
856 0.986876 192.168.5.14 192.168.1.6 ICMP 98 Echo (ping) reply id=0x507b, seq=1/256, ttl=254 (request in 724)
859 0.992419 192.168.1.6 192.168.1.100 DNS 79 Standard query 0xe447 A shark.example.com
860 0.992473 192.168.1.6 192.168.1.100 DNS 79 Standard query 0x6845 AAAA shark.example.com
861 0.992539 192.168.1.100 192.168.1.6 DNS 95 Standard query response 0xe447 A shark.example.com A 192.168.5.14
862 0.992582 192.168.1.100 192.168.1.6 DNS 135 Standard query response 0x6845 AAAA shark.example.com SOA server64.example.com
863 0.992680 192.168.1.6 192.168.5.14 ICMP 98 Echo (ping) request id=0xd73b, seq=1/256, ttl=64 (reply in 869)
869 1.008689 192.168.5.14 192.168.1.6 ICMP 98 Echo (ping) reply id=0xd73b, seq=1/256, ttl=254 (request in 863)
Why am I not even seeing any ping attempt to 192.168.1.38 in the tcp dump; did I miss to capture something? And how could I find out where ping thinks the response from 192.168.5.14 comes from, when the only response coming in is for a different ping, i.e. that actually going out for 192.168.5.14?
My network config:
/etc/network/interfaces
Code:
# loopback interface
auto lo
iface lo inet loopback
# physical interfaces
iface enp193s0f0np0 inet manual
iface enp193s0f1np1 inet manual
iface enp10s0f0 inet manual
iface enp10s0f1 inet manual
iface enp12s0f3u2u2c2 inet manual
# bond
auto bond0
iface bond0 inet manual
bond-slaves enp193s0f0np0 enp193s0f1np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
# bridge for bond, local interface, VMs
auto vmbr0
iface vmbr0 inet static
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 10,30,50,60,70,1000,1001
auto vmbr0.10
iface vmbr0.10 inet static
address 192.168.1.6/24
address 192.168.1.101/24
gateway 192.168.1.254