VM network interruptions (TX dropped on tap interface)

Sascha72036

Renowned Member
Aug 22, 2016
22
1
68
35
Hello,

I am observing recurring network interruptions on one of my VMs. Monitoring shows packet loss spikes up to 70–80% as well as high round-trip time outliers. An interesting observation is that the problem only appears after 3–4 days of VM uptime. Other VMs on the same host are not affected. The issue disappears for a few days after a full power cycle (shutdown + start) of the VM, but not after a simple reboot inside the guest.

Bildschirmfoto 2025-09-28 um 12.30.39.png

On the host, the following is noticeable:

root@ryzen05:~# ip -s link show tap786i0
25: tap786i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq master fwbr786i0 state UNKNOWN mode DEFAULT group default qlen 10000
link/ether 4a:26:e2:70:80:4f brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
1882201209168 2021057229 0 0 0 0
TX: bytes packets errors dropped carrier collsns
257321724233 2195148008 0 133164 0 0

The TX dropped counter increases on the tap interface. On vmbr0 there are no drops:

root@ryzen05:~# ip -s link show vmbr0
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:5a:9e:d9 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
174694370682 3782608825 0 0 0 10135359
TX: bytes packets errors dropped carrier collsns
93766 1337 0 0 0 0

Config of the VM:
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 32
cpu: host
hotplug: disk,network
memory: 245760
name: neu.mc07.mc-host24.de
net0: virtio=BC:24:11:63:CD:C7,bridge=vmbr0,firewall=1,queues=32,tag=203
numa: 0
ostype: l26
protection: 1
scsi0: replica2:vm-786-disk-0,cache=unsafe,discard=on,iops_rd=4000,iops_wr=4000,mbps_rd=800,mbps_wr=800,size=500G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=8778c5de-6b7f-4a1f-995d-5a54bae5af0a
sockets: 1
vmgenid: f9ab0955-7478-48d7-afb9-531310cee873

It does not seem to be related to the TX dropped counters after all, but to something else. The value stayed constant before and during the timeouts. Does anyone have an idea what could cause these TX dropped counters on the tap interface and lead to such packet loss?
 
Last edited:
Both on the host and inside the VM, the logs (journalctl, dmesg -T) look normal. What stands out are very high context switch rates (around 300k+ per second). The VM is running about 100 Docker containers, which could be relevant. Maybe this is part of the problem, but I don’t see an obvious way to address it other than reducing the VM size.

photo_2025-09-28 18.04.00.jpeg
 
Last edited: