[SOLVED] System loses networking

paulmorabi

Member
Mar 30, 2019
81
8
13
45
Hi,

I'm using the latest Proxmox on a Ryzen 2700. I've been running two Windows VM's very stable for a long while now. I suddenly noticed I couldn't connect remotely and thought the system had crashed so rebooted. After checking the disk via smartctl, fsck etc. I discovered that its the networking that is dropping out.

If I start one Windows VM, it boots and I can log in but there is no network access in the VM even though Windows reports it is fine. Likewise, Proxmox host itself seems to disconnect from the network and stops responding to pings. I've tried tailling the syslog but nothing useful seems to be appearing.

Even without starting a VM and doing an "apt update" will cause the network to drop out. I lose external access (via ping, ssh and also web).

Any ideas what could be causing this?

Code:
root@pve:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.4: 6.4-4
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.11.22-1-pve: 5.11.22-2
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.8-1
proxmox-backup-file-restore: 2.0.8-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
Last edited:
After spending hours googling and looking at logs, I ended up finding a similar issue reported in Ubuntu forums. After rebooting my router, everything magically started working again. Must have been a weird network/routing issue.
 
I'm having the same issue. After 8min network goes offline. Only solution is reboot.
No idea what is hapening. Disabled the firewall still issue.
 
it's 2023 now and i'm having the same problem. I have 2 vm's and 1 lxc running on proxmox and every 4 to 5 hours my whole proxmox instance loses network so that means all my vm's and lxc also lose network. The only way to fix it is to reboot the whole system and then after 4 to 5 hours it happens again. The first 4 months of using proxmox it was all fine but now this problems keeps happening and i have no idea how to fix it.... anyone that has the same problem and could help?
 
I now also loose connection over ethernet NICs after some hours of work. Which is a problem for iSCSI storage. I can see storage not online in syslog. But can not find RC. I created as workaround a 2nd LUN and use a separate VLAN now which seems to be more stable than the first LUN using my normal network. But when I try to move Hard Disks over it fails after 60% and vmbr0 does not reach my Synology storage anymore. Local ping on the PVE can reach IP. Pings from other VMs fail. This worked stable several month. Any idea?!?

Proxmox VE 7.3-6 x86_64, ProLiant DL360 Gen9, Kernel 5.15.102-1-pve
 
For me it was another server in the same switch. After a while I found out that my old MacMini server had very long ping times to the switch. A reboot fixed it and since then the iSCSI connection works stable again. So it was not related to ProxMox NICs or Linux bridge.
 
Hi all,

I also encountered this issue and stumbled upon this forum post. I ended up fixing the issue - for me I had not added a Linux VLAN to the network configuration for the node, despite the node actually being on that VLAN (tagged, configured by port on my router). My fix specifically was:

  1. Goto PVE node webpage
  2. Select the node -> 'Network'
  3. "Create" -> Linux VLAN
  4. Change the name to vmbr0.xx (where xx is your vlan tag)
  5. Configured the rest of the fields according to your network
If you royally screwed your network somehow and can't even access your nodes webpage, you can try manually editing the interfaces file, although it is not recommended.

nano /etc/network/interfaces
It should look something like this:


code_language.shell:
auto lo
iface lo inet loopback

iface <controller id> inet manual

auto vmbr0
iface vmbr0 inet manual
    bridge-ports <controller id>
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr0.xx
iface vmbr0.xx inet static
    address <ip address>
    gateway <your vlan gateway>
#Name of this VLAN

source /etc/network/interfaces.d/*
 
Hi all,

I also encountered this issue and stumbled upon this forum post. I ended up fixing the issue - for me I had not added a Linux VLAN to the network configuration for the node, despite the node actually being on that VLAN (tagged, configured by port on my router). My fix specifically was:

  1. Goto PVE node webpage
  2. Select the node -> 'Network'
  3. "Create" -> Linux VLAN
  4. Change the name to vmbr0.xx (where xx is your vlan tag)
  5. Configured the rest of the fields according to your network
If you royally screwed your network somehow and can't even access your nodes webpage, you can try manually editing the interfaces file, although it is not recommended.

nano /etc/network/interfaces
It should look something like this:


code_language.shell:
auto lo
iface lo inet loopback

iface <controller id> inet manual

auto vmbr0
iface vmbr0 inet manual
    bridge-ports <controller id>
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr0.xx
iface vmbr0.xx inet static
    address <ip address>
    gateway <your vlan gateway>
#Name of this VLAN

source /etc/network/interfaces.d/*

Hello, i am having the same problem .
there's the error in the log :
Apr 14 18:41:15 vrProxmox kernel: e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang: TDH <91> TDT <1e> next_to_use <1e> next_to_clean <90>buffer_info[next_to_clean]: time_stamp <1027432a8> next_to_watch <91> jiffies <102743701> next_to_watch.status <0>MAC Status <40080083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>

My modem/router connect with a VLAN 40.
I did like you say and added the vmbr0.40 in the node network.
the interrface now look like this :

auto lo
iface lo inet loopback

iface eno2 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.4/24
gateway 192.168.1.1
bridge-ports eno2
bridge-stp off
bridge-fd 0

iface wlo1 inet manual

auto vmbr0.40
iface vmbr0.40 inet static
address 192.168.1.4/24
mtu 1492
#VLAN 40

source /etc/network/interfaces.d/*



I'm probably doing something wrong cause i still have the error.
Is there something else to change. I joined a picture of the Network node conf.
Do i have something to do to link the vm to that, it actually use the "vmbr0" network
 

Attachments

  • Screenshot 2025-04-14 071706.png
    Screenshot 2025-04-14 071706.png
    18.9 KB · Views: 20
Hello, i am having the same problem .
there's the error in the log :
Apr 14 18:41:15 vrProxmox kernel: e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang: TDH <91> TDT <1e> next_to_use <1e> next_to_clean <90>buffer_info[next_to_clean]: time_stamp <1027432a8> next_to_watch <91> jiffies <102743701> next_to_watch.status <0>MAC Status <40080083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>

My modem/router connect with a VLAN 40.
I did like you say and added the vmbr0.40 in the node network.
the interrface now look like this :

auto lo
iface lo inet loopback

iface eno2 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.4/24
gateway 192.168.1.1
bridge-ports eno2
bridge-stp off
bridge-fd 0

iface wlo1 inet manual

auto vmbr0.40
iface vmbr0.40 inet static
address 192.168.1.4/24
mtu 1492
#VLAN 40

source /etc/network/interfaces.d/*



I'm probably doing something wrong cause i still have the error.
Is there something else to change. I joined a picture of the Network node conf.
Do i have something to do to link the vm to that, it actually use the "vmbr0" network
ok, i have it fix, it was this solution https://first2host.co.uk/blog/how-to-fix-proxmox-detected-hardware-unit-hang/
and my interfaces looks like this in the end :

auto lo
iface lo inet loopback

iface eno2 inet manual
post-up /sbin/ethtool -K eno2 gso off tso off rxvlan off txvlan off

auto vmbr0
iface vmbr0 inet static
address 192.168.1.4/24
gateway 192.168.1.1
bridge-ports eno2
bridge-stp off
bridge-fd 0

iface wlo1 inet manual
source /etc/network/interfaces.d/*
 
Last edited:
I observed this problem when i upgraded from Linux 6.8.12-7-pve on x86_64 to every other upgrade hereafter, my solution was after testing any new kernel to go back Linux 6.8.12-7-pve on x86_64.
I will try the solution with the line added
post-up /sbin/ethtool -K eno2 gso off tso off rxvlan off txvlan off

to my interface i have a Lenovo tiny 13900 with an Intel I219LM NIC.

My other solution will be to change the vmbr to use a USB 2,5 GB adapter when i have the switch for this higher speed network backbone! Maybe that is a more viable solution?
 
Hi,

I'm currently having the same issue. This has started recently. I believed it was a network infrastructure failure (switch), but after changing the switch out. The issue came back.

I could be moving big files (3-5GB) between VMs or connecting via VPN on Wireguard LXC, and it would randomly drop out.

I have to force reset the server to get it to come online. Is there any resolution to this, as I'm now looking to move back to a Windows server for Hyper-V or some alternative version of Proxmox.

I would be grateful for any help at all.

Thank you.
 
Same issue on a Lenovo Tiny P3. Networking drops out at least every 48 hours. Unplugging the network cable from the NIC and re-plugging effectively resets networking and brings it back. If the post-up fix doesn’t work, I will investigate a cronjob to test and restart networking as a workaround but I’m following this thread for a real fix.
 
Last edited:
Same issue on Intel Nuc. It start about 1-2 months ago, sometime working well, sometimes network is down 5 minutes after restart, sometime working well all day.
 
Same issue observed. Pulling the network cable resets the adapter and networking returns.

Proxmox running on the following:

root@host:/boot# ls
config-6.8.12-9-pve memtest86+ia32.bin pve
efi memtest86+ia32.efi System.map-6.8.12-9-pve
grub memtest86+x64.bin vmlinuz-6.8.12-9-pve
initrd.img-6.8.12-9-pve memtest86+x64.efi

root@host:~# lspci -v | grep Ethernet
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 05)
Subsystem: Dell Ethernet Connection I217-LM

root@host:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp0s25 inet manual

auto vmbr0

iface vmbr0 inet static
address 192.168.1.169/24
gateway 192.168.1.1
bridge-ports enp0s25
bridge-stp off
bridge-fd 0
 
same issue here:
------------------------
root@pve:~# lspci -v | grep Ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
Subsystem: Intel Corporation Ethernet Connection (6) I219-V

root@pve:~# ls /boot
config-6.8.12-10-pve config-6.8.12-9-pve initrd.img-6.8.12-11-pve memtest86+ia32.bin pve System.map-6.8.12-8-pve vmlinuz-6.8.12-4-pve
config-6.8.12-11-pve efi initrd.img-6.8.12-4-pve memtest86+ia32.efi System.map-6.8.12-10-pve System.map-6.8.12-9-pve vmlinuz-6.8.12-8-pve
config-6.8.12-4-pve grub initrd.img-6.8.12-8-pve memtest86+x64.bin System.map-6.8.12-11-pve vmlinuz-6.8.12-10-pve vmlinuz-6.8.12-9-pve
config-6.8.12-8-pve initrd.img-6.8.12-10-pve initrd.img-6.8.12-9-pve memtest86+x64.efi System.map-6.8.12-4-pve vmlinuz-6.8.12-11-pve

root@pve:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.178.91/24
gateway 192.168.178.1
bridge-ports eno1
bridge-stp off
bridge-fd 0

iface wlp0s20f3 inet manual
 
Same as well. Although this isn't the first system we've had this issue on. The other two Dell systems that we had this issue with, we ended up just using a USB to Ethernet adapter that resolved this. We believe this is a driver issue, particularly when it occurs on multiple systems, which can only be resolved(long-term) by using a USB dongle.

This is a new system that I just installed 3 hours ago on a brand new Dell Pro Max Tower T2 system, and it just happened to me when I set up a new Debian VM with mostly default settings.

I also tried using GPT, which helped me make adjustments on a previous system with this exact issue. We disabled hardware offloading, enabling additional logging, and no tweaks that I'm smart enough to make have resolved this yet, which again leads me to the above. I think this is a driver issue.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
root@pve-cientname-onsite:~# lspci -v | grep Ethernet
80:1f.6 Ethernet controller: Intel Corporation Device 550c (rev 10)
root@pve-douglas-onsite:~# ls /boot
config-6.8.12-13-pve efi initrd.img-6.8.12-13-pve memtest86+ia32.bin memtest86+x64.bin pve System.map-6.8.12-9-pve vmlinuz-6.8.12-9-pve
config-6.8.12-9-pve grub initrd.img-6.8.12-9-pve memtest86+ia32.efi memtest86+x64.efi System.map-6.8.12-13-pve vmlinuz-6.8.12-13-pve
root@pve-douglas-onsite:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp128s31f6 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.0.0.167/24
gateway 10.0.0.254
bridge-ports enp128s31f6
bridge-stp off
bridge-fd 0


source /etc/network/interfaces.d/*
root@pve-clientname-onsite:~#
 
Since this thread comes up under some common proxmox networking searches, and most of us reporting problems are running Intel NIC's, I'll cross link to this thread:

https://forum.proxmox.com/threads/e1000e-eno1-detected-hardware-unit-hang.59928/page-3

Check the System Log for your node. You may find the following message coinciding with the time your node goes offline:

Jul 29 20:54:39 hostname kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:

If that's the case, you're probably running into the issue that's being discussed in above thread. The short answer seems to be that there's an issue with offloading features in Intel NIC's, or potentially a driver issue, either way someone has written a proxmox helper script has been written to disable the offloading features:

https://community-scripts.github.io/ProxmoxVE/scripts?id=nic-offloading-fix

I'd suggest reviewing the script thoroughly before running. Or not, YOLO'ing random scripts from the internet is a great way to learn exciting new things in the realm of system administration :D

(I am not a Lawyer. For topical use only. Ask your doctor if Proxmox is right for you. Discontinue use if virtualization lasts longer than 4 hours.)
 
  • Like
Reactions: dumb_user