RouterOS VM drops packets when throughput reaches ~400–500 Mbps (only between VMs on same bridge)

Malakas

New Member
Jul 12, 2025
5
0
1
Hi,

I have a packet drop issue when running RouterOS (CHR) as a VM on Proxmox VE.

Problem:
- When any CHR VM interface reaches around 400–500 Mbps, it starts dropping packets.
- The higher the bandwidth, the more packet loss.Using bandwidth testing can achieve over 2G (no further testing has been conducted beyond this point) of bandwidth.
- Ping to Proxmox host VLAN IP or physical switch IP is fine (no loss).
- Only pings between RouterOS VMs on the same Linux bridge (xvmbr2) drop packets.
- Other VMs on the same bridge (e.g. CentOS, iKuai) do NOT drop packets — only RouterOS.

Environment:
- Server: Dell PowerEdge R640
- Proxmox VE: 8.3 (kernel 6.8.12-4-pve)
- VM: RouterOS CHR 6.49.18 (also tested 6.45.9, same result)
- License: P10
- VM config:
- 16 vCPU, 8 GB RAM (balloon=0)
- Machine type: q35 (also tried i440fx, no change)
- NICs: virtio (also tried e1000e/vmxnet3 — worse)
- Host NICs: Intel 82599ES 10G + Broadcom BCM57800

What I see:
- On host: `rx_drops` increase when traffic rises
- On RouterOS VM: `tx_drops` on virtio interface
- Only affects RouterOS VMs on the same bridge

Question:
- Is this likely a Proxmox/virtio/bridge issue, or something specific to RouterOS CHR?
- Any tuning recommendations (vhost, offloading, IRQ affinity, bridge settings, etc.)?

Thanks!
 
You could set the multiqueue for the virtual NIC(s). But not higher than the number for vCPUs the VM has.
 
You could set the multiqueue for the virtual NIC(s). But not higher than the number for vCPUs the VM has.
Thanks for the suggestion!
I already tried enabling multiqueue on the virtio NICs (tested with values up to the vCPU count), but unfortunately the packet loss behavior is the same.

It seems that once the CHR VM’s interface hits ~400–500 Mbps, drops still occur even with multiqueue enabled.
Do you think this could still be a virtio limitation, or ROS does not support this?
 
You could install VyOS and route with that to find out if it’s a RouterOS issue or not.

Are you doing a lot of small packets?
Are you maxing out some resource, CPU, interrupts, etc.?
 
You could install VyOS and route with that to find out if it’s a RouterOS issue or not.

Are you doing a lot of small packets?
Are you maxing out some resource, CPU, interrupts, etc.?
Thanks for the suggestion!
I know VyOS and it’s definitely a capable system, but at the moment I’m sticking with RouterOS since it’s more convenient for my workflow — VyOS commands tend to be more complex.

In my tests, CPU usage inside the CHR VM stays low (<30%) and host interrupts don’t seem to be maxed out.

I’m really hoping to find a solution for this issue, so any suggestions or ideas would be greatly appreciated!
 
Try RouterOS 7?

Seems like a RouterOS issue, if other hosts on the same bridge do not have this issue.
 
Try RouterOS 7?

Seems like a RouterOS issue, if other hosts on the same bridge do not have this issue.
I had the same thought, but I’m a bit concerned that upgrading our current v6 environment to RouterOS 7 might cause some issues.
 
You should spin up new CHR instances anyway to test from a clean slate.

Worry about production environment once you have found the root cause.
 
You might want to start migrating off CHR in any case, especially if you are looking forward to any kind of traffic growth.

CHR isn’t very performant at the best of times: https://blog.kroy.io/2019/08/23/battle-of-the-virtual-routers/#The_Hardware

VyOS looks like a better long term software router, especially not that it has VPP support. Might as well start to bite the bullet now.
Thanks for the advice. I agree that testing with a fresh CHR instance makes sense, and I’ll give that a try to see if the issue can be reproduced from a clean slate.

I also understand the point about CHR performance limits and alternatives like VyOS with VPP support. For now though, since our existing setup and workflows are heavily based on RouterOS, I’d really like to figure out why CHR is dropping packets in this scenario.

If I can reproduce the issue on a new instance, that might help confirm whether it’s a configuration/environment issue or an inherent RouterOS limitation.