Setup:
- PVE Cluster 8.1.0 / Linux 6.5.13-3-pve
- Typical Proxmox installation with bridged WAN (Linux Bridge)
Test VMs:
#1: Debian 12
- Kernel: 6.1.0-18-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
#2: Debian 11
- Kernel: 5.10.0-21-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
#3: Ubuntu 23.10
- Kernel: 6.5.0-26-generic
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
Error description:
Test VM #1 & Test VM #2 are loosing their WAN uplink sporadic for up to 5 minutes.
While the error occurs the IP link is still up, the bridge keeps in forwarding state and there is absolutely no log entry at all (syslog, dmesg, ip monitor).
It sounds like an ARP problem but tcpdump also shows correct who-has and is-at responses from the vms.
Debugging is very difficult because the error only occurs if the Debian instances are getting "idle". If the vm is pinged continuously, the error does not occur at all.
After 30 seconds until 5 minutes, the ping command works again.
If the error occurs and the vm is running an outbound ping or traceroute to anywhere (by VNC console), the connection also recovers instantly.
Debugging Setup:
Cronjob @ every 30 minutes: ping all 3 instances
The error also occurs with Intel E1000 and Realtek network drivers.
The mac and IP addresses are unique and the ARP Cache from the router also confirms that mac / ip is not claimed by any other vm.
I also tried to upgrade to Debian Sid with the current kernel 6.7.9-amd64 but the error also occurs.
Has anyone else encountered this problem with Debian?
- PVE Cluster 8.1.0 / Linux 6.5.13-3-pve
- Typical Proxmox installation with bridged WAN (Linux Bridge)
Test VMs:
#1: Debian 12
- Kernel: 6.1.0-18-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
#2: Debian 11
- Kernel: 5.10.0-21-amd64
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
#3: Ubuntu 23.10
- Kernel: 6.5.0-26-generic
- IP & Mac Filter enabled (NDP enabled)
- Firewall: Block all Incoming (except ICMP Test server ip) / Allow all outgoing
- Network driver: VirtIO
Code:
# /etc/network/interfaces
auto lo
iface lo inet loopback
auto ens18
iface ens18 inet static
address <wan-ip>
netmask 255.255.255.0
gateway <wan-gateway>
Code:
# ip link
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether <vm-mac> brd ff:ff:ff:ff:ff:ff
altname enp0s18
Error description:
Test VM #1 & Test VM #2 are loosing their WAN uplink sporadic for up to 5 minutes.
While the error occurs the IP link is still up, the bridge keeps in forwarding state and there is absolutely no log entry at all (syslog, dmesg, ip monitor).
It sounds like an ARP problem but tcpdump also shows correct who-has and is-at responses from the vms.
Debugging is very difficult because the error only occurs if the Debian instances are getting "idle". If the vm is pinged continuously, the error does not occur at all.
After 30 seconds until 5 minutes, the ping command works again.
If the error occurs and the vm is running an outbound ping or traceroute to anywhere (by VNC console), the connection also recovers instantly.
ping <wan-ip>
PING <wan-ip> (<wan-ip>): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
64 bytes from <wan-ip> icmp_seq=10 ttl=63 time=17.107 ms
64 bytes from <wan-ip>: icmp_seq=11 ttl=63 time=17.752 ms
PING <wan-ip> (<wan-ip>): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
64 bytes from <wan-ip> icmp_seq=10 ttl=63 time=17.107 ms
64 bytes from <wan-ip>: icmp_seq=11 ttl=63 time=17.752 ms
Debugging Setup:
Cronjob @ every 30 minutes: ping all 3 instances
The error also occurs with Intel E1000 and Realtek network drivers.
The mac and IP addresses are unique and the ARP Cache from the router also confirms that mac / ip is not claimed by any other vm.
I also tried to upgrade to Debian Sid with the current kernel 6.7.9-amd64 but the error also occurs.
Has anyone else encountered this problem with Debian?
Last edited: