VM network interruptions (TX dropped on tap interface)

Sascha72036 · 2025-09-28T13:51:31+0200

Hello,

I am observing recurring network interruptions on one of my VMs. Monitoring shows packet loss spikes up to 70–80% as well as high round-trip time outliers. An interesting observation is that the problem only appears after 3–4 days of VM uptime. Other VMs on the same host are not affected. The issue disappears for a few days after a full power cycle (shutdown + start) of the VM, but not after a simple reboot inside the guest.

Bildschirmfoto 2025-09-28 um 12.30.39.png

On the host, the following is noticeable:

root@ryzen05:~# ip -s link show tap786i0
25: tap786i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq master fwbr786i0 state UNKNOWN mode DEFAULT group default qlen 10000
link/ether 4a:26:e2:70:80:4f brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
1882201209168 2021057229 0 0 0 0
TX: bytes packets errors dropped carrier collsns
257321724233 2195148008 0 133164 0 0

The TX dropped counter increases on the tap interface. On vmbr0 there are no drops:

root@ryzen05:~# ip -s link show vmbr0
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:5a:9e:d9 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
174694370682 3782608825 0 0 0 10135359
TX: bytes packets errors dropped carrier collsns
93766 1337 0 0 0 0

Config of the VM:

agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 32
cpu: host
hotplug: disk,network
memory: 245760
name: neu.mc07.mc-host24.de
net0: virtio=BC:24:11:63:CD:C7,bridge=vmbr0,firewall=1,queues=32,tag=203
numa: 0
ostype: l26
protection: 1
scsi0: replica2:vm-786-disk-0,cache=unsafe,discard=on,iops_rd=4000,iops_wr=4000,mbps_rd=800,mbps_wr=800,size=500G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=8778c5de-6b7f-4a1f-995d-5a54bae5af0a
sockets: 1
vmgenid: f9ab0955-7478-48d7-afb9-531310cee873

It does not seem to be related to the TX dropped counters after all, but to something else. The value stayed constant before and during the timeouts. Does anyone have an idea what could cause these TX dropped counters on the tap interface and lead to such packet loss?

Onslow · 2025-09-28T17:12:33+0200

Is there anything unusual in journalctl or in dmesg -T in the VM or in the host?

Sascha72036 · 2025-09-28T18:09:37+0200

Both on the host and inside the VM, the logs (journalctl, dmesg -T) look normal. What stands out are very high context switch rates (around 300k+ per second). The VM is running about 100 Docker containers, which could be relevant. Maybe this is part of the problem, but I don’t see an obvious way to address it other than reducing the VM size.

Onslow · 2025-09-28T19:35:04+0200

Disclaimer: I'm not an expert and I have only done some googling.

Maybe the queue length is involved. But as far as I can see in your initial post, it's already increased (qlen 10000) from what is usually set, AFAICS, 1000.

Anyway, maybe this link is of some help:

https://docs.redhat.com/en/document..._the_tx_queue_of_the_instance_s_tap_interface

Search

Search

VM network interruptions (TX dropped on tap interface)

Sascha72036

Renowned Member

Onslow

Member

Sascha72036

Renowned Member

Onslow

Member

We value your privacy