[SOLVED] how to troubleshoot dropped packets

godsavethequ33n · Jun 8, 2022

I recently reinstalled netdata and have been receiving notifications like this:

Output of ip -s link show vmbr0

Code:

20: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fc:34:97:a1:31:7d brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped missed  mcast
    3881705530 12322876 0       491986  0       101866
    TX: bytes  packets  errors  dropped carrier collsns
    171668466551 13828385 0       0       0       0

Addition info if it helps at all:

/etc/network/interfaces

Code:

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.125/24
        gateway 192.168.1.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0

load average: 1.21, 1.29, 1.36

1 NIC. No HP switches. I have a TP-Link TL-SG108PE V3 and a NETGEAR GS308 on my network.

I have a constant ping to the Proxmox host IP, a VM IP on PM and out to google all from a pc on the network. I notice the Proxmox host IP and VM IP time out for 3 pings at the same exact time... ping to google is fine. At this same time my Plex direct streams on one of the VMs will stop to buffer as well.

What steps to I take to begin troubleshooting this?

shrdlicka · Jun 10, 2022

Hi,

looks like a hard problem to me

. It could be that your Proxmox server is so busy that it is dropping packets which would also stop the VMs on it from getting them. It could also be that there is some defective hardware.

I would first try to figure out if there is maybe some issue with high system load at the times when the packets get dropped.

godsavethequ33n · Jun 16, 2022

Code:

[ 1979.225665] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <1c>
                 TDT                  <be>
                 next_to_use          <be>
                 next_to_clean        <1b>
               buffer_info[next_to_clean]:
                 time_stamp           <100065d31>
                 next_to_watch        <1c>
                 jiffies              <100066760>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[ 1979.353382] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[ 1979.443144] vmbr0: port 1(eno1) entered disabled state
[ 1983.132165] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 1983.132225] vmbr0: port 1(eno1) entered blocking state
[ 1983.132228] vmbr0: port 1(eno1) entered forwarding state

Just noticed this in the logs. Likely related. Coincides with the outages. Not sure what would cause this.

Using onboard NIC.

Code:

Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: PRIME Z590-V
        Version: Rev 1.xx

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (14) I219-V (rev 11)

Code:

Linux 5.15.35-1-pve #1 SMP PVE 5.15.35-3

shrdlicka · Jun 17, 2022

Yes that is a bug in the driver this could fix it [1]:

Code:

ethtool -K eth0 gso off gro off tso off

[1] https://serverfault.com/questions/6...expectedly-detected-hardware-unit-hang#616623

godsavethequ33n · Jun 17, 2022

shrdlicka said:
Yes that is a bug in the driver this could fix it [1]:

Code:

ethtool -K eth0 gso off gro off tso off

[1] https://serverfault.com/questions/6...expectedly-detected-hardware-unit-hang#616623

Testing this now.

godsavethequ33n · Jun 24, 2022

Disabling only tso as several others have recommend seemed to do the trick! Thank you!

Code:

ethtool -K eno1 tso off

Added to /etc/network/interfaces as well. Have not tested to see if it survives a reboot yet.

dlasher · Jun 25, 2023

Having same problem - testing fix - will report back in 48h.

dlasher · Jun 26, 2023

dlasher said:
Having same problem - testing fix - will report back in 48h.

Did not fix the problem.

```

net_packets.vmbr0CHART

inbound packets dropped ratio = 0.17%
the ratio of inbound dropped packets vs the total number of received packets of the network interface, during the last 10 minutesALARM

vmbr0FAMILY

masgo · Jul 10, 2023

I am having the same or a related problem, but my dmesg is empty, no errors about hangs or anything else. I narrowed it down to the following.

Each PVE Node has a dedicated network card which is used only for Link 0 in the cluster. They are connected to a simple gigabit switch which only connects to other PVE nodes in the same cluster. So only PVE is "talking" on this network.
There is one packed dropped every 30 seconds. This happens on multiple machines, but not on all of them.

Here are two machines
View attachment 52652
and the other for the same timeframe
View attachment 52653

Since they both have the drop at the same time, I am guessing that some kind of broadcast is happening in the network. But what/which node causes it? Any ideas how to diagnose this?

I don't really care about single packet drops, but on the other interfaces I have a lot of dops (and also a lot of traffic which makes the diagnose more difficult)

GastonJ · Sep 17, 2023

Sorry, it's an old thread, but I came across the same issue. vmbr0 dropping packets like there's no tomorrow

Reviewed this https://blog.hambier.lu/post/tracking-dropped-packets
Turns out I had 2 devices sending out unknown packets -- identified using tcpdump

7679
7374
7a7a
7380

Blocked them using ebtables
Problem now gone.
I'll look back in tomorrow and see if it's fixed for good.

Was my SkyQ boxes.

ben29 · Dec 13, 2023

Hi.
did you manage to fix it?

i'm stuck with a lot of dropped packages.

GastonJ said:
Sorry, it's an old thread, but I came across the same issue. vmbr0 dropping packets like there's no tomorrow

Reviewed this https://blog.hambier.lu/post/tracking-dropped-packets
Turns out I had 2 devices sending out unknown packets -- identified using tcpdump

7679
7374
7a7a
7380

Blocked them using ebtables
Problem now gone.
I'll look back in tomorrow and see if it's fixed for good.

Was my SkyQ boxes.

GastonJ · Dec 18, 2023

ben29 said:
Hi.
did you manage to fix it?

i'm stuck with a lot of dropped packages.

Yes it did, dropped packets gone, never to return.

ben29 · Dec 18, 2023

GastonJ said:
Yes it did, dropped packets gone, never to return.

Can you tell me how ?

GastonJ · Dec 18, 2023

That was on my old server, and I've updated since

https://blog.hambier.lu/post/tracking-dropped-packets - I used Wireshark to identify the ports, then tcpdump to confirm the ports. Then issued the following

sudo tcpdump -v -i vmbr0 ether proto 0x7579
sudo tcpdump -v -i vmbr0 ether proto 0x7374
sudo tcpdump -v -i vmbr0 ether proto 0x7a7a
sudo tcpdump -v -i vmbr0 ether proto 0x7380

sudo ebtables -A INPUT -p 7579 -j DROP
sudo ebtables -A INPUT -p 7374 -j DROP
sudo ebtables -A INPUT -p 7a7a -j DROP
sudo ebtables -A INPUT -p 7380 -j DROP

to drop them. To be honest I hadn't checked on my new server yet whether I need to do this on there, I've been busy with other things. Yes I need to complete those steps again. I'll fire up the sky boxes later and update tomorrow. I need to check that they are the ports I need to block this time, though they probably are.

Then, from memory

sudo apt install netfilter-persistent
EBT=/usr/share/netfilter-persistent/plugins.d/35-ebtables
sudo wget -O $EBT https://git.zeyel.net/snippets/30/raw?inline=false
sudo chmod +x $EBT
sudo $EBT save

HTH

ben29 · Dec 18, 2023

GastonJ said:
That was on my old server, and I've updated since

https://blog.hambier.lu/post/tracking-dropped-packets - I used Wireshark to identify the ports, then tcpdump to confirm the ports. Then issued the following

sudo tcpdump -v -i vmbr0 ether proto 0x7579
sudo tcpdump -v -i vmbr0 ether proto 0x7374
sudo tcpdump -v -i vmbr0 ether proto 0x7a7a
sudo tcpdump -v -i vmbr0 ether proto 0x7380

sudo ebtables -A INPUT -p 7579 -j DROP
sudo ebtables -A INPUT -p 7374 -j DROP
sudo ebtables -A INPUT -p 7a7a -j DROP
sudo ebtables -A INPUT -p 7380 -j DROP

to drop them. To be honest I hadn't checked on my new server yet whether I need to do this on there, I've been busy with other things. Yes I need to complete those steps again. I'll fire up the sky boxes later and update tomorrow. I need to check that they are the ports I need to block this time, though they probably are.

Then, from memory

sudo apt install netfilter-persistent
EBT=/usr/share/netfilter-persistent/plugins.d/35-ebtables
sudo wget -O $EBT https://git.zeyel.net/snippets/30/raw?inline=false
sudo chmod +x $EBT
sudo $EBT save

HTH

Are you sure you don’t have any drops?

GastonJ · Dec 19, 2023

ben29 said:
Are you sure you don’t have any drops?

I know I have drops. I just need to start my sky boxes and other IOT devices to identify them. It's a busy time of year, so haven't had time to revisit and reapply. You'll know when it stops the mumber of drops with

ifconfig vmbr0

will stop increasing.

Search

Search

[SOLVED] how to troubleshoot dropped packets

godsavethequ33n

Member

shrdlicka

Proxmox Retired Staff

godsavethequ33n

Member

shrdlicka

Proxmox Retired Staff

godsavethequ33n

Member

godsavethequ33n

Member

dlasher

Renowned Member

dlasher

Renowned Member

masgo

Well-Known Member

GastonJ

Member

ben29

New Member

GastonJ

Member

ben29

New Member

GastonJ

Member

ben29

New Member

GastonJ

Member