VM network freeze

After running kernel 4.15.18-7-pve for a couple of weeks we HAVENT encounter any of the problems mentioned before by or any of the other users. The nod is running fine and it is housing mix of LXC and KVMs.
 
We plan to upgrade our test-cluster to -7 on monday. We'll report our investigations regarding the mentioned problems afterwards.
 
We've tested the latest kernel right now, same problems:

SCP transfer (and other kind of communications) from one LXC to another shows stalling transferspeeds after X amount of transferred bytes. We'd highly suggest to look into this as this affects all of our current cluster setups and we're forced to stick with an older kernel as of now.

Appreciated.
 
Hello again,

we really need some help now. We're paying a huge amount of money to Proxmox per year and would like to have this issue solved. We cannot upgrade _any_ machine and are stuck with an old version. Please help us.

Appreciated.
 
Hello again,

we really need some help now. We're paying a huge amount of money to Proxmox per year and would like to have this issue solved. We cannot upgrade _any_ machine and are stuck with an old version. Please help us.

Appreciated.

Do you run latest kernel? If not, please upgrade and test again.

If you have a valid subscription with support ticket support, please get in touch via our enterprise support team via https://my.proxmox.com
 
Do you run latest kernel? If not, please upgrade and test again.

If you have a valid subscription with support ticket support, please get in touch via our enterprise support team via https://my.proxmox.com
The problem is it exists only in newer kernel versions. We're running version 4.15.18-4-pve #1 SMP PVE 4.15.18-23 (Thu, 30 Aug 2018 13:04:08 +0200) because all kernels released afterwards are faulty, although we could not find any changes in your git repository concerning our observed bug.

We'll try to use the ticket system now. Appreciated.
 
Quick update and reality check!

I don't know if this is still the same issue, but it definitely looks like it so I'm gonna post it here. After having a very smooth sail for a while i got one of our server updated with 5.4 and running 4.15.18-12-pve #1 SMP PVE 4.15.18-35, and after that im getting very weird behavior from some if not most of the vm's that are running on that node. The VMs would lose network connectivity for no apparent reason, you cant ping anything on the network for short periods of time then it would work until it doesn't. Its dropping connections at a totally random, for example, i would use one of the Vm's to connect to customers vpn and it would start dropping packets like crazy or once it connects ill get reconnecting every few seconds. I have not made any changes related to the setup rather than updating the installed packages. I saw a few networking related topics in the forum that look awfully familiar to what i experience. I was wondering if it just me or any of the participants of this topic could relate?
Thanks,
Mladen
 
Last edited:
I had the same symptoms. It only applys to windows vms and virtio. I fixed it by editing the vm-nic (in windows) and disable all offloading features or change to e1000.
 
I wish your solution would've worked. I have problems with both, win and Linux vm's, but the interruptions are totally weird. And from the looks of it seems like those vms just their knowledge of the network and they need some time to rediscover everything but also drops the connectivity. The old trick with ping from withing vm seems to work, but is not ideal. My windows vms are running e1000, but ill try to switch it with vritio to test it out for a little before i revert to the one kernel i know it works just fine. ill keep you all posted.

UPDATE: After switching to virtio things got even worst. The vm would lose connectivity every 30ish or so seconds, but if i start a ping to anything it will go uninterrupted and vm is working just fine. Im reverting to e1000, and maybe to an older kernel version later today.

UPDATE2: reverted to kernel - 4.15.18-9-pve and its all back to normal

Thanks,
Mladen
 
Last edited:
I've been experiencing a similar issue, but *always* with pfSense & opnSense VMs, and the only thing I could see as the reason might've been "high" bandwidth, (but then I have Debian based VMs doing much much more network IO, but then those aren't internet facing as the opnSense/pfSense)... and another one (okay, not "google-able" though internet facing) that never gave the same problems.

I initally thought it was the cluster setup between the two OVH DCs over the vRack interfaces, but the opnSense isn't clustered, but is now the one the gives the most "hangs". Typically rebooting the VM (quick solution) fixes things, but will need to relook at the opnSense whether I need to replace with FortiGate-VM if it'll be the same troubles??

root@proxsb01:~# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
pve-zsync: 1.7-4
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
 
Are you using virtio network cards or intel?
Did you disable all the offloading in pf/opnsense?

See also https://pfstore.com.au/blogs/guides/run-opnsense-in-proxmox

I've been experiencing a similar issue, but *always* with pfSense & opnSense VMs, and the only thing I could see as the reason might've been "high" bandwidth, (but then I have Debian based VMs doing much much more network IO, but then those aren't internet facing as the opnSense/pfSense)... and another one (okay, not "google-able" though internet facing) that never gave the same problems.

I initally thought it was the cluster setup between the two OVH DCs over the vRack interfaces, but the opnSense isn't clustered, but is now the one the gives the most "hangs". Typically rebooting the VM (quick solution) fixes things, but will need to relook at the opnSense whether I need to replace with FortiGate-VM if it'll be the same troubles??

root@proxsb01:~# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
pve-zsync: 1.7-4
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
 
i cant think of any reason not to use virtio, but be sure to disable all the different offloading in the firewall
 
i cant think of any reason not to use virtio, but be sure to disable all the different offloading in the firewall
I believe in the beginning there were some issues with Virtio net driver (or not supported) with pfSense, but that was maybe 3-4 years ago... there actually was a pfSense fork (nice interface) that took out the hardware related stuff, and build a FreeBSD kernel/etc. with the virtio stuff... can't find it now, but in those days the e1000 was "needed" on pfSense, so yes coming from those and upgraded from those older pfSense to today it was just "sticked" to
 
Hello!
I have the same problem. When the network adapter is on high load (we are using 1GB network), after some time the server stops responding on network requests. It was noticed on two Proxmox hosts (v5.x and 6.x) with Windows and Linux guest systems. When I disable / enable the NIC from guest console, the server is back on network again. For temporary workaround I have made simple script which runs on scheduller which disables / enables nic if cant ping network router's ip. My sollution works but this is not considered as serious solution.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!