Network randomly broke with no changes

0xhresult

Member
Sep 20, 2022
5
0
6
Hello everyone,

Our VMs randomly lost connection a few days ago and I am unable to find a solution for it. The server was untouched for a few months.


Code:
$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         142.xxx.240.1   0.0.0.0         UG    0      0        0 vmbr0
localnet        0.0.0.0         255.255.255.0   U     0      0        0 vmbr0
142.xxx.240.3   0.0.0.0         255.255.255.255 UH    0      0        0 vmbr0
142.xxx.240.4   0.0.0.0         255.255.255.255 UH    0      0        0 vmbr0
142.xxx.240.5   0.0.0.0         255.255.255.255 UH    0      0        0 vmbr0
142.xxx.240.6   0.0.0.0         255.255.255.255 UH    0      0        0 vmbr0
142.xxx.240.7   0.0.0.0         255.255.255.255 UH    0      0        0 vmbr0
... more

The packet is received on the host node, but not received on the VM:

Code:
➜  ~ ping 142.xxx.240.20
PING 142.xxx.240.20 (142.xxx.240.20): 56 data bytes
Request timeout for icmp_seq 0

Code:
$ tcpdump -envi vmbr0 | grep '142.xxx.240.20'
tcpdump: listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
    91.xxx.222.104 > 142.xxx.240.20: ICMP echo request, id 22040, seq 148, length 64
    91.xxx.222.104 > 142.xxx.240.20: ICMP echo request, id 22040, seq 149, length 64

I have already tried upgrading to latest Proxmox, rebooting the node but to no success:

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.6-1
proxmox-backup-file-restore: 2.2.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

The VMS are configured correctly.


Code:
cat /etc/network/interfaces
source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug eno2
iface eno2 inet static
    dns-nameservers 8.8.8.8 8.8.4.4 1.1.1.1
    dns-search tld

auto vmbr0
iface vmbr0 inet static
    address 142.xxx.240.2
    gateway 142.xxx.240.1
    netmask 255.255.255.0
    bridge_ports eno2
    bridge_stp off
    bridge_fd 0

Anyone has an idea on how to debug this further?
 
What is also weird:

Code:
$ ethtool vmbr0

Settings for vmbr0:
   Supported ports: [  ]
   Supported link modes:   Not reported
   Supported pause frame use: No
   Supports auto-negotiation: No
   Supported FEC modes: Not reported
   Advertised link modes:  Not reported
   Advertised pause frame use: No
   Advertised auto-negotiation: No
   Advertised FEC modes: Not reported
   Speed: 10000Mb/s
   Duplex: Unknown! (255) 
   Auto-negotiation: off
   Port: Other
   PHYAD: 0
   Transceiver: internal
   Link detected: yes

On another node (that works) only this is displayed:

Code:
Settings for vmbr0:
   Link detected: yes

Is this related?
 
Hello,

Unfortunately I still have not find a solution for this.

Does anyone has an idea on how to debug it or fix it?

I have tried everything and am out of ideas.


Thanks!
 
Hello everyone,

We are still facing the issue randomly on our nodes.

Is there any chance someone has an idea on how to fix it or any commands to debug it?

Thanks in advance!
 
Can you tell us more about the network infrastructure, please?
I'm no expert but I think I see public ip addresses in the screenshots.
 
This is the /etc/network/interfaces file of the server:

Code:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug eth0
iface eth0 inet manual
 # dns-* options are implemented by the resolvconf package, if installed
 dns-nameservers 8.8.8.8 8.8.4.4 1.1.1.1
 dns-search tld
 hwaddress ether xx:xx:xx:xx:xx:xx

auto vmbr0
iface vmbr0 inet static
    address 45.80.xxx.x
    gateway 45.80.xxx.1
    netmask 255.255.255.0
    bridge_ports eth0
    bridge_stp off
    bridge_fd 5
    hwaddress xx:xx:xx:xx:xx:xx


So its a normal bridged network setup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!