VM network freeze

mladen popov · Oct 24, 2018

After running kernel 4.15.18-7-pve for a couple of weeks we HAVENT encounter any of the problems mentioned before by or any of the other users. The nod is running fine and it is housing mix of LXC and KVMs.

fstrankowski · Oct 25, 2018

We plan to upgrade our test-cluster to -7 on monday. We'll report our investigations regarding the mentioned problems afterwards.

fstrankowski · Oct 26, 2018

We've tested the latest kernel right now, same problems:

SCP transfer (and other kind of communications) from one LXC to another shows stalling transferspeeds after X amount of transferred bytes. We'd highly suggest to look into this as this affects all of our current cluster setups and we're forced to stick with an older kernel as of now.

Appreciated.

fstrankowski · Jan 9, 2019

Hello again,

we really need some help now. We're paying a huge amount of money to Proxmox per year and would like to have this issue solved. We cannot upgrade _any_ machine and are stuck with an old version. Please help us.

Appreciated.

tom · Jan 9, 2019

norderstedt said:
Hello again,

we really need some help now. We're paying a huge amount of money to Proxmox per year and would like to have this issue solved. We cannot upgrade _any_ machine and are stuck with an old version. Please help us.

Appreciated.

Do you run latest kernel? If not, please upgrade and test again.

If you have a valid subscription with support ticket support, please get in touch via our enterprise support team via https://my.proxmox.com

fstrankowski · Jan 9, 2019

tom said:
Do you run latest kernel? If not, please upgrade and test again.

If you have a valid subscription with support ticket support, please get in touch via our enterprise support team via https://my.proxmox.com

The problem is it exists only in newer kernel versions. We're running version 4.15.18-4-pve #1 SMP PVE 4.15.18-23 (Thu, 30 Aug 2018 13:04:08 +0200) because all kernels released afterwards are faulty, although we could not find any changes in your git repository concerning our observed bug.

We'll try to use the ticket system now. Appreciated.

PotterNick · Jan 14, 2019

Reset 'mate' settings, not? or read the logs

mladen popov · Apr 19, 2019

Quick update and reality check!

I don't know if this is still the same issue, but it definitely looks like it so I'm gonna post it here. After having a very smooth sail for a while i got one of our server updated with 5.4 and running 4.15.18-12-pve #1 SMP PVE 4.15.18-35, and after that im getting very weird behavior from some if not most of the vm's that are running on that node. The VMs would lose network connectivity for no apparent reason, you cant ping anything on the network for short periods of time then it would work until it doesn't. Its dropping connections at a totally random, for example, i would use one of the Vm's to connect to customers vpn and it would start dropping packets like crazy or once it connects ill get reconnecting every few seconds. I have not made any changes related to the setup rather than updating the installed packages. I saw a few networking related topics in the forum that look awfully familiar to what i experience. I was wondering if it just me or any of the participants of this topic could relate?
Thanks,
Mladen

mac.linux.free · Apr 20, 2019

I had the same symptoms. It only applys to windows vms and virtio. I fixed it by editing the vm-nic (in windows) and disable all offloading features or change to e1000.

mladen popov · Apr 22, 2019

I wish your solution would've worked. I have problems with both, win and Linux vm's, but the interruptions are totally weird. And from the looks of it seems like those vms just their knowledge of the network and they need some time to rediscover everything but also drops the connectivity. The old trick with ping from withing vm seems to work, but is not ideal. My windows vms are running e1000, but ill try to switch it with vritio to test it out for a little before i revert to the one kernel i know it works just fine. ill keep you all posted.

UPDATE: After switching to virtio things got even worst. The vm would lose connectivity every 30ish or so seconds, but if i start a ping to anything it will go uninterrupted and vm is working just fine. Im reverting to e1000, and maybe to an older kernel version later today.

UPDATE2: reverted to kernel - 4.15.18-9-pve and its all back to normal

Thanks,
Mladen

mladen popov · May 28, 2019

Quick update: Ive been running 4.15.18-14-pve #1 SMP PVE 4.15.18-39 kernel for the past 10 days on all 3 of our nodes and have not noticed any of the problems mentioned before.

hvisage · Oct 6, 2019

I've been experiencing a similar issue, but *always* with pfSense & opnSense VMs, and the only thing I could see as the reason might've been "high" bandwidth, (but then I have Debian based VMs doing much much more network IO, but then those aren't internet facing as the opnSense/pfSense)... and another one (okay, not "google-able" though internet facing) that never gave the same problems.

I initally thought it was the cluster setup between the two OVH DCs over the vRack interfaces, but the opnSense isn't clustered, but is now the one the gives the most "hangs". Typically rebooting the VM (quick solution) fixes things, but will need to relook at the opnSense whether I need to replace with FortiGate-VM if it'll be the same troubles??

root@proxsb01:~# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
pve-zsync: 1.7-4
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

djzort · Oct 7, 2019

Are you using virtio network cards or intel?
Did you disable all the offloading in pf/opnsense?

See also https://pfstore.com.au/blogs/guides/run-opnsense-in-proxmox

hvisage said:
I've been experiencing a similar issue, but *always* with pfSense & opnSense VMs, and the only thing I could see as the reason might've been "high" bandwidth, (but then I have Debian based VMs doing much much more network IO, but then those aren't internet facing as the opnSense/pfSense)... and another one (okay, not "google-able" though internet facing) that never gave the same problems.

I initally thought it was the cluster setup between the two OVH DCs over the vRack interfaces, but the opnSense isn't clustered, but is now the one the gives the most "hangs". Typically rebooting the VM (quick solution) fixes things, but will need to relook at the opnSense whether I need to replace with FortiGate-VM if it'll be the same troubles??

root@proxsb01:~# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
pve-zsync: 1.7-4
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

hvisage · Oct 9, 2019

djzort said:
Are you using virtio network cards or intel?
Did you disable all the offloading in pf/opnsense?

See also https://pfstore.com.au/blogs/guides/run-opnsense-in-proxmox

Hmmm... now that you mention it, those that give "problems" are e1000 based, while the cluster without trouble is virtio... let me change/fix those and see...

djzort · Oct 9, 2019

i cant think of any reason not to use virtio, but be sure to disable all the different offloading in the firewall

hvisage · Oct 10, 2019

djzort said:
i cant think of any reason not to use virtio, but be sure to disable all the different offloading in the firewall

I believe in the beginning there were some issues with Virtio net driver (or not supported) with pfSense, but that was maybe 3-4 years ago... there actually was a pfSense fork (nice interface) that took out the hardware related stuff, and build a FreeBSD kernel/etc. with the virtio stuff... can't find it now, but in those days the e1000 was "needed" on pfSense, so yes coming from those and upgraded from those older pfSense to today it was just "sticked" to

rakaris · Nov 7, 2019

Hello!
I have the same problem. When the network adapter is on high load (we are using 1GB network), after some time the server stops responding on network requests. It was noticed on two Proxmox hosts (v5.x and 6.x) with Windows and Linux guest systems. When I disable / enable the NIC from guest console, the server is back on network again. For temporary workaround I have made simple script which runs on scheduller which disables / enables nic if cant ping network router's ip. My sollution works but this is not considered as serious solution.

hvisage · Nov 7, 2019

rakaris said:
Hello!
I have the same problem. When the network adpapter is on high load (we are using 1GB network), the server stops responding to network requests.

VirtIO or E1000 adapters for the guests??

rakaris · Nov 7, 2019

hvisage said:
VirtIO or E1000 adapters for the guests??

Proxmox 5
Linux guest - virtio
Proxmox 6
Windows Guest E1000

rakaris · Nov 21, 2019

rakaris said:
Proxmox 5
Linux guest - virtio
Proxmox 6
Windows Guest E1000

Any ideas?

VM network freeze

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

New Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Active Member

Renowned Member

Active Member

Active Member

We value your privacy