problem - network slow kvm

cvandeplas · Jun 6, 2009

I'm facing a difficult networking problem for the second time.
Here's the situation:
- proxmox with mostly openvz machines, but also a kvm machine (playing the role of router/firewall)
- I have 'slow reaction' on the network to my kvm machine. But bandwidth is not a problem. pings vary from 2ms to 1000ms (see below)
- Slow reaction is not present when connecting to openvz machines (ping stable from 1 to 4ms)
- problem present on both IPv4 adn IPv6

Sat 06 Jun 12:03 $ ping vm01 => from laptop to hypervisor = OK
PING vm01.home.xxxxx (10.107.2.240): 56 data bytes
64 bytes from 10.107.2.240: icmp_seq=0 ttl=64 time=4.267 ms
64 bytes from 10.107.2.240: icmp_seq=1 ttl=64 time=0.947 ms
64 bytes from 10.107.2.240: icmp_seq=2 ttl=64 time=0.933 ms

Sat 06 Jun 12:05 $ ping nas01 => from laptop to openvz virtual machine = OK
PING nas01.home.xxxxx (10.107.2.8): 56 data bytes
64 bytes from 10.107.2.8: icmp_seq=0 ttl=64 time=3.919 ms
64 bytes from 10.107.2.8: icmp_seq=1 ttl=64 time=0.952 ms
64 bytes from 10.107.2.8: icmp_seq=2 ttl=64 time=0.844 ms

Sat 06 Jun 12:05 $ ping gw01 => from laptop to KVM firewall = NOK
PING gw01.home.xxxxx (10.107.2.254): 56 data bytes
64 bytes from 10.107.2.254: icmp_seq=0 ttl=64 time=1000.727 ms
64 bytes from 10.107.2.254: icmp_seq=1 ttl=64 time=1000.851 ms
64 bytes from 10.107.2.254: icmp_seq=2 ttl=64 time=1.910 ms
64 bytes from 10.107.2.254: icmp_seq=3 ttl=64 time=1001.805 ms
64 bytes from 10.107.2.254: icmp_seq=4 ttl=64 time=999.146 ms
64 bytes from 10.107.2.254: icmp_seq=5 ttl=64 time=82.865 ms
64 bytes from 10.107.2.254: icmp_seq=6 ttl=64 time=15.473 ms

From cabling perspective everything passes on the same switches and cables.

I did some network sniffing on my laptop, on the bridge of the hypervisor/vm01 and on the gw01 (kvm machine).
The results are:
- my laptop sends a packet on the network at T
- my laptop sees reply at T + ~1000ms (long delay)
- vm01 sees packet arrive on vmbr2 at T
- vm01 sees reply at T + ~1000ms (long delay)

- gw01 sees packet arrive on machine at T + ~900ms
- gw01 sends reply immediately (no delay)

=> Conclusion: Delay is between the hypervisor and the KVM virtual machine.

A month ago I had exactly the same behavior. I fixed it by reinstalling my VM from scratch (I was performing a network migration at that time).
For 1 or 2 weeks there were no problems (not that I noticed), but now it's starting again...

Something about the gw01 configuration:
- 5 virtio network interfaces
- virtio disk
- ubuntu linux 9.04 kernel 2.6.28-11-server, up2date
- slow behavior is on every interface of the machine (from and to every dmz)

The hypervisor/vm01:
- standard proxmox install, no extra repo, up2date

I'm reaching the limit of my knowledge and don't know where to continue debugging...
I could re-create a new vm, but that's again a temporary solution and still doesn't explain the cause.

Thanks for your help.
Any ideas for possible debugging are welcome

dietmar · Jun 7, 2009

Please test without virtio net.

cvandeplas · Jun 8, 2009

Deleting the virtio interfaces and replacing them by rtl8139 seems to be a valid workaround. (currently)

But it still doesn't explain what the cause is... :'(

Unfortunately all ways to do troubleshooting are gone now

thanks

mangoo · Jun 9, 2009

cvandeplas said:
Deleting the virtio interfaces and replacing them by rtl8139 seems to be a valid workaround. (currently)

But it still doesn't explain what the cause is... :'(

Unfortunately all ways to do troubleshooting are gone now

thanks

I've been trying to figure out what causes this problem for a long time now.

I'm not 100% sure yet (it usually takes few hours up to several days until the problem happens), but the cause seems to be "fairsched.diff" patch which is applied to vanilla KVM sources.

I compiled KVM without this patch and I don't observer any slowness for 2 days now with virtio network drivers. If the network doesn't slow down for a week, I'll post an update here.

Petrus4 · Sep 26, 2009

mangoo said:
I've been trying to figure out what causes this problem for a long time now.

I'm not 100% sure yet (it usually takes few hours up to several days until the problem happens), but the cause seems to be "fairsched.diff" patch which is applied to vanilla KVM sources.

I compiled KVM without this patch and I don't observer any slowness for 2 days now with virtio network drivers. If the network doesn't slow down for a week, I'll post an update here.

Mangoo.. did the slowness return or was the patch the fix?

mangoo · Sep 26, 2009

Petrus4 said:
Mangoo.. did the slowness return or was the patch the fix?

The patch was the problem.

The slowness didn't return.

Petrus4 · Sep 26, 2009

mangoo said:
The patch was the problem.

The slowness didn't return.

Right! sorry It was late when I wrote this..

Do you know if this issue is fixed in the newest version of proxmox-ve? (1.3)

mangoo · Oct 13, 2009

Petrus4 said:
Right! sorry It was late when I wrote this..

Do you know if this issue is fixed in the newest version of proxmox-ve? (1.3)

No, the "slowness" for guests using virtio_net is not solved in Proxmox VE 1.3 (and the problem is still there in the latest beta of 1.4, too).

akls · Oct 31, 2009

mangoo said:
No, the "slowness" for guests using virtio_net is not solved in Proxmox VE 1.3 (and the problem is still there in the latest beta of 1.4, too).

I got the same problem at Proxmox VE 1.4.
Machine is Dell 2950 III
net is Broadcom 5708

escoreal · Jan 11, 2011

Same problem here with pve 1.7:

pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

The extreme high latency with virtio makes this unusable.

macday · Jan 11, 2011

Same to me. Switched to e1000 til the problem is fixed. Using 2.6.35

tobru · Jan 20, 2011

I have exactly the same troubles. A VM with 2 vCPUs and virtio-net has an extremely bad network performance:

Code:

root@server1:~# iperf -i 10 -m -t 120 -c server2.tobru.ch
------------------------------------------------------------
Client connecting to james.tobru.ch, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.10 port 36767 connected with 10.0.0.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  47.4 MBytes  39.7 Mbits/sec
[  3] 10.0-20.0 sec  46.3 MBytes  38.9 Mbits/sec
[  3] 20.0-30.0 sec    164 MBytes    138 Mbits/sec
[  3] 30.0-40.0 sec  98.4 MBytes  82.5 Mbits/sec
[  3] 40.0-50.0 sec    102 MBytes  85.4 Mbits/sec
[  3] 50.0-60.0 sec    115 MBytes  96.3 Mbits/sec
[  3] 60.0-70.0 sec  86.2 MBytes  72.3 Mbits/sec
[  3] 70.0-80.0 sec    112 MBytes  93.7 Mbits/sec
[  3] 80.0-90.0 sec    127 MBytes    107 Mbits/sec
[  3] 90.0-100.0 sec  22.2 MBytes  18.6 Mbits/sec                                                                                                   
[  3] 100.0-110.0 sec  13.2 MBytes  11.0 Mbits/sec                                                                                                  
[  3] 110.0-120.0 sec  57.3 MBytes  48.1 Mbits/sec                                                                                                  
[  3]  0.0-120.0 sec    990 MBytes  69.2 Mbits/sec                                                                                                  
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

Reconfiguring the VM to use only 1 vCPU, the network performance is much better:

Code:

root@server1:~# iperf -i 10 -m -t 120 -c server2.tobru.ch
------------------------------------------------------------
Client connecting to james.tobru.ch, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.10 port 56966 connected with 10.0.0.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    510 MBytes    428 Mbits/sec
[  3] 10.0-20.0 sec    911 MBytes    765 Mbits/sec
[  3] 20.0-30.0 sec    921 MBytes    773 Mbits/sec
[  3] 30.0-40.0 sec    709 MBytes    594 Mbits/sec
[  3] 40.0-50.0 sec    654 MBytes    548 Mbits/sec
[  3] 50.0-60.0 sec    920 MBytes    772 Mbits/sec
[  3] 60.0-70.0 sec    886 MBytes    744 Mbits/sec
[  3] 70.0-80.0 sec    977 MBytes    819 Mbits/sec
[  3] 80.0-90.0 sec  1016 MBytes    852 Mbits/sec
[  3] 90.0-100.0 sec    737 MBytes    618 Mbits/sec
[  3] 100.0-110.0 sec    890 MBytes    747 Mbits/sec
[  3] 110.0-120.0 sec    913 MBytes    766 Mbits/sec
[  3]  0.0-120.0 sec  9.81 GBytes    702 Mbits/sec
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

That's to bad, as I'd like to give the VM 2 vCPUs.

Code:

pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.7-9
pve-kernel-2.6.35-1-pve: 2.6.35-9
pve-kernel-2.6.24-5-pve: 2.6.24-6
pve-kernel-2.6.24-1-pve: 2.6.24-4
pve-kernel-2.6.24-2-pve: 2.6.24-5
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

tom · Jan 20, 2011

also post your VMID.conf (for both) and let us know which OS and which drivers do you use inside? windows?

escoreal · Jan 20, 2011

hello,

it happens on Linux and Windows. Linux Guest is Ubuntu 10.04.1 x64 Server with default installation. Windows is 2003 x86 Standard Edition with the driver from "virtio-win-1.1.16.iso".

We didn't make any throughput tests because latency of 1s (1000ms) was not usable for our software.

for example one configuration of the linux guest:

name: <removed>
ide2: local:iso/ubuntu-10.04.1-server-amd64.iso,media=cdrom
vlan0: virtio=4A:7F:B3:6A:29:6D
bootdisk: virtio0
virtio0: local:10044/vm-10044-disk-1.raw,cache=writeback
ostype: l26
memory: 3072
sockets: 1
onboot: 0
description: <removed>
cores: 2

On one Windows guest (2003 with Active Directory) even a few services didn't start on reboot with virtio network driver (Net Logon, Computer Browser, DFS).

esco

tobru · Jan 20, 2011

tom said:
also post your VMID.conf (for both) and let us know which OS and which drivers do you use inside? windows?

I'm only using Ubuntu Server 10.04.1!

tom · Jan 20, 2011

regarding your iperf setup: what is server1 and what is server2? (host or guest, running on which host)

tobru · Jan 20, 2011

tom said:
regarding your iperf setup: what is server1 and what is server2? (host or guest, running on which host)

sorry for the little information =(
* server1 is a physical server
* server2 is a kvm-vm running on an IBM x3400 Server (NOT server1!)

tom · Jan 20, 2011

I did some testing here and I got similar results using two server (about 3 years old) with intel entry level server boards with xeon X3210.

But I did these test on a more modern Intel Modular Server - no performance loss using more than one vCPU in the guest - I also tested a KVM guest with 2 x 4 (8 vCPU´s).

tobru · Jan 20, 2011

tom said:
I did some testing here and I got similar results using two server (about 3 years old) with intel entry level server boards with xeon X3210.

But I did these test on a more modern Intel Modular Server - no performance loss using more than one vCPU in the guest - I also tested a KVM guest with 2 x 4 (8 vCPU´s).

My IBM server is also not the newest model, it's 3 years old. The processor is a Intel Quad-Core Intel Xeon E5320 1.86 GHz (8MB L2 cache) and I have 2 250GB SATA Disks (Mirror). So perhaps I have to upgrade my hardware for more perfomance =(

problem - network slow kvm

New Member

Proxmox Staff Member

New Member

Member

Member

Member

Member

Member

akls

Guest

Renowned Member

Member

Member

Proxmox Staff Member

Renowned Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

We value your privacy