lol, i had help and direction i only saw this twice in my career - once in 2000 (when an adaptec network driver was corrupting a packet - but only for one machine - it was only when the machine name was set to a very specific name in a backup packet) (literally if the machine was called something like win01 it worked, but if the machine was called win02 that one packet was corrupted and thrown away - stopping just a backup app from working, the issue was a bug in their TCP offload engine - they never fixed it, shows why they went out of business)not sure my networking skills are good enough
the second was in the last few weeks when it turned out TCPv6 was totally broken in thunderbolt-net.
on the sender machine and recieving machine you run
tcpdump -i <interfacename> ip6
(note if you don't have filters you will collect a lot of traffic so run these for the short time you do a test - don't use ip6 like i did - that just captures ip6..)Then you get something like this:
Code:
root@pve1:~# cat pve1.tcpdump
<noise removed>
11:56:04.963688 IP6 fe80::8e:95ff:fef1:621a > fe80::1a:44ff:fe65:dbe0: ICMP6, neighbor solicitation, who has fe80::1a:44ff:fe65:dbe0, length 32
11:56:04.963726 IP6 fe80::1a:44ff:fe65:dbe0 > fe80::8e:95ff:fef1:621a: ICMP6, neighbor advertisement, tgt is fe80::1a:44ff:fe65:dbe0, length 24
11:56:05.259237 IP6 xxxx:xxxx:830:81::81.60670 > xxxx:xxxx:830:81::82.ssh: Flags [S], seq 2237794262, win 65460, options [mss 65460,sackOK,TS val 2783495228 ecr 0,nop,wscale 7], length 0
and
Code:
root@pve2:~# cat pve2.tcpdump
<noise removed>
11:56:04.962860 IP6 fe80::8e:95ff:fef1:621a > fe80::1a:44ff:fe65:dbe0: ICMP6, neighbor solicitation, who has fe80::1a:44ff:fe65:dbe0, length 32
11:56:04.963212 IP6 fe80::1a:44ff:fe65:dbe0 > fe80::8e:95ff:fef1:621a: ICMP6, neighbor advertisement, tgt is fe80::1a:44ff:fe65:dbe0, length 24
This tells you that the SSH packet was sent to the driver on the sender, but never received by the destination.
then i did this
Code:
PVE1 (sender)
root@pve1:~# ip -s -s link show en06
7: en06: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 02:1a:44:65:db:e0 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
19047333241 20301503 0 0 0 0
RX errors: length crc frame fifo overrun
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
15558035141 18392655 7 0 0 0
TX errors: aborted fifo window heartbt transns
0 0 0 0 2
PVE2 (destination)
root@pve2:~# ip -s -s link show en05
74: en05: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 02:8e:95:f1:62:1a brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
15561305384 19172991 0 0 0 0
RX errors: length crc frame fifo overrun
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
19050263184 18347026 0 0 0 0
TX errors: aborted fifo window heartbt transns
0 0 0 0 2
do you see the 7 error every time i tried to SSH those errors ticked up on the sender (not reciever) - this indicated the driver or hardware was dropping the packets - then i contacted the owner of the code, showed them all this and they fixed it!
Ideally you have 3 captures going - one on the sender, one on the receiver and one on the switch in the middle (or a 3rd sniffer machine you mirror the ports to so it sees all traffic) - then you can compare traces to see if the packet hit the wire on not. Note i didn't do it that way because this was thunderbolt-net and there is no way to put a switch / sniffer in the middle.
any hoo, a little off topic, but hopefuly this gives you more tests you can do if moving to clean vanilla no vlan switch doesn't work.
--edit--
oh and the cap files can be loaded into wireshark to make it easier to analyze if you have lots of entries.
Last edited: