VM network freeze

mladen popov · Jun 25, 2018

The nod has mixed architecture of VM's, a windows vm, a bunch of Ubuntus, Red Hats, and CentOS. I've noticed that I'm running 4.15.17-1 kernel on my other nod so I tried to downgrade the kernel to 4.15.17-1 and I'm watching it now, it looks better still no drop. I'll post an update after some time. Thanks for the suggestion Markus!

Update:
After reverting to kernel 4.15.17-1, the problem disappeared.

mac.linux.free · Jun 27, 2018

mladen popov said:
The nod has mixed architecture of VM's, a windows vm, a bunch of Ubuntus, Red Hats, and CentOS. I've noticed that I'm running 4.15.17-1 kernel on my other nod so I tried to downgrade the kernel to 4.15.17-1 and I'm watching it now, it looks better still no drop. I'll post an update after some time. Thanks for the suggestion Markus!

Update:
After reverting to kernel 4.15.17-1, the problem disappeared.

I want to do that, too. How could I remove Kernel 4.15.17-3 on an headless server? Edit grub? If I remove apt-get remove pve-kernel-4.15.17-3-pve than this proxmox-ve pve-kernel-4.15 pve-kernel-4.15.17-3-pve would bee removed, too.

Are there problems with downgrading on zfs-systems 0.7.9 vs 0.7.8?

mladen popov · Jun 27, 2018

I just rebooted the server, hold shit down and from the advanced menu choose 4.15.17-1

mac.linux.free · Jun 27, 2018

mladen popov said:
I just rebooted the server, hold shit down and from the advanced menu choose 4.15.17-1

my server is headless...without monitor and keyboard

mac.linux.free · Jun 27, 2018

I hope the disconnections are solved in the next kernel version.

markusd · Jun 27, 2018

There is a grub parameter, to choose the standard kernel ,which should boot. I have used this some time ago..hm, i think it was in /etc/default/grub
GRUB_DEFAULT=

rhu · Jul 20, 2018

Is this problem solved with the new Kernel pve-kernel-4.15.18-1-pve?

mladen popov · Jul 21, 2018

Just tested it, and the problem still persists with pve-kernel-4.15.18-1.

rhu · Jul 21, 2018

Bad News, but thank you very much for your tests!

mac.linux.free · Aug 24, 2018

Just tested. The issue is still there with pve-kernel-4.15.18-2. But it seem to appear only on Windows 10 VMs not on Windows 2012R2 or Windows 7. I have also updaten every Virtio Drivers to 0.1.160

mac.linux.free · Aug 24, 2018

we are now back to e1000 until this issue has been fixed

mac.linux.free · Aug 25, 2018

tried pve-kernel-4.15.18-3 from pvetest...it looks better on this kernel...but there is another issue comming up traffic from the host through a virtual pfsense is not working any more..but with the older kernel it worked

perhaps there is a problem with balance-slb on ovs

mac.linux.free · Aug 28, 2018

ok balance-slb is broken on pve-kernel-4.15.18.3...changed to active-passive and it is working again

talyan · Aug 29, 2018

Same problem. After update proxmox 4 to 5.1 (now 5.2-6). Only problem interface E1000. Happens under high load, VMs (centos, freebsd). Has changed interface virtio, where possible - helped. On two vm, does not visible interfaces (virtio, realtek, vmware) except used e1000. Any ideas?

tomstephens89 · Sep 17, 2018

This issue appears to be present in 5.2.

mac.linux.free · Sep 17, 2018

Issue is almost gone if you set Multiqueue according to your virtual Cores in the network settings.

TwiX · Sep 19, 2018

Hi,

From proxmox documentation :

You also need to set in the VM the number of multi-purpose channels on each VirtIO NIC with the ethtool command:

ethtool -L ens1 combined X

where X is the number of the number of vcpus of the VM.

Maybe settle multiqueue for Windows guest OS is not enough

mladen popov · Oct 15, 2018

One of our problematic servers got moved to new hardware(not related to this problem) so I'm running fresh proxmox install with Linux 4.15.18-7-pve kernel and am testing it to see if the problem still persist. Ill update this post in a day or two.

fstrankowski · Oct 16, 2018

Hello everyone, we're encountering a massive performance issue once we've upgraded all our hosts to the latest kernel "4.15.18-5". The prior kernel was working fine (4.15.18-4). The problem persists and is reproducable once we're booting into the "...-5"-kernel.

Symptoms:

- VM becomes unresponsive (network wise) and all connections stall and datatransfer goes nuts (stalling)
- Network interface stalls when having an unspecified amount of inbound data hitting network interface of the VM.

Reproduce:

- Create one VM on a host with the -4 kernel
- Create on VM on a host with the -5 kernel
- Transfer data from "-4-Host"-VM to "-5"-Host-VM

Result:

- VM on 4.15.18-4 ==> VM on 4.15.18-5 (not working)
- VM on 4.15.18-5 ==> VM on 4.15.18-4 (working)

So concluded the 4.15.18-4 Kernel is fine while after updating to 4.15.18-5 atleast incoming data transfers stall after an unspecified amount of data (10 MB to 3 GB tested).

@mladen popov Do you encounter the same symptoms?

As far as we were able to reproduce this, we think this happens only with LXC, not KVM

kroki0815 · Oct 16, 2018

Clarification for last post:
Only lxc is affacted, OS-independent. ( We have debian-stretch and centos-7.5 here).
Only the inbound direction is affected.
Only lxc on differrent host's are affacted.

Problem is reproducable with scp, rsync, galera-cluster-sync(Xtrabackup).

Bandwith is for short time normal, than it decrases to some Kb/s. Somtimes it is stalled for a couple of seconds.

VM network freeze

Member

Renowned Member

Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Active Member

Renowned Member

Renowned Member

Renowned Member

Member

Renowned Member

New Member

We value your privacy