Sporadic network losses for Debian 11/12 VMs

linushstge

Active Member
Dec 5, 2019
77
10
28
Setup:
- PVE Cluster 8.1.0 / Linux 6.5.13-3-pve
- Typical Proxmox installation with bridged WAN (Linux Bridge)

Test VMs:
#1: Debian 12
- Kernel: 6.1.0-18-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO

#2: Debian 11
- Kernel: 5.10.0-21-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO

#3: Ubuntu 23.10
- Kernel: 6.5.0-26-generic
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO

Code:
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto ens18
iface ens18 inet static
    address <wan-ip>
    netmask 255.255.255.0
    gateway <wan-gateway>

Code:
# ip link

ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether <vm-mac> brd ff:ff:ff:ff:ff:ff
    altname enp0s18


Error description:
Test VM #1 & Test VM #2 are loosing their WAN uplink sporadic for up to 5 minutes.
While the error occurs the IP link is still up, the bridge keeps in forwarding state and there is absolutely no log entry at all (syslog, dmesg, ip monitor).
It sounds like an ARP problem but tcpdump also shows correct who-has and is-at responses from the vms.

Debugging is very difficult because the error only occurs if the Debian instances are getting "idle". If the vm is pinged continuously, the error does not occur at all.

After 30 seconds until 5 minutes, the ping command works again.
If the error occurs and the vm is running an outbound ping or traceroute to anywhere (by VNC console), the connection also recovers instantly.

ping <wan-ip>
PING <wan-ip> (<wan-ip>): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
64 bytes from <wan-ip> icmp_seq=10 ttl=63 time=17.107 ms
64 bytes from <wan-ip>: icmp_seq=11 ttl=63 time=17.752 ms

Debugging Setup:
Cronjob @ every 30 minutes: ping all 3 instances

pve-ping-result.jpg


The error also occurs with Intel E1000 and Realtek network drivers.
The mac and IP addresses are unique and the ARP Cache from the router also confirms that mac / ip is not claimed by any other vm.

I also tried to upgrade to Debian Sid with the current kernel 6.7.9-amd64 but the error also occurs.

Has anyone else encountered this problem with Debian?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!