KVM network performance: latency vs throughput, do I have to decide?

I used pve-kernel-2.6.32-4-pve: 2.6.32-31 and latest KVM 0.14
(all from pvetest)
 
I test guest on the same host, so same NIC.
i will try with kernel 2.6.32 from pvetest, currently i'm using 2.6.35
 
Ok with kernel version 2.6.32 from pvetest i can't reproduce the bug.

Throughput lower than kernel 2.6.35 (1,3Gb vs 2gb) but no crash :)

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
0 24.022 167618.510 1340.948 65533.867
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
4026.531840 24.022 1456.352 1340.948
Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
61440.000 2557.655 12 8862.42 10.9
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
2764807 142441 0 0 20.20

I hope the next release of drivers will fix this issue, it would be nice because 2.6.35 really improve results, but no KSM support :(
Maybe someone can coroborate this results and Proxmox team put a note on the wiki about virtio / 2008 R2 / 2.6.35 ?
 
I just did more test with my KVM guests, just to make sure that I got the same setup.

I have two servers (IMS blades), running 2.6.32 (pvetest with kvm 0.14), KVM virtio network. On the second blade I run a KVM guest, e.g. Ubuntu 10.10. now, I am doing iperf tests from this guest to the first blade.

results in both directions: 940 Mbits/sec (I have only Gbit LAN in my IMS so its just perfect)

so where is the difference in our tests?
Hi Tom,
now the issue (only 200KBit/s with virtio) is solved. It's a problem with tso and the 10GB-Nic. With a 10GB-Nic from different manufacturer this effekt don't happens. But also not with other kernels (mail from Solarflare support):
Code:
I had ran some test using RHEL6 KVM host, and NOT observed any performance issue with or without VLAN. 

But I am able to replicate the issue in-house with virtual environment PROXMOX (you have provided). 
I had a discussion with the Engineering Team about this behaviour and it seems, the issue is related to the kernel. Kernel does not seems to be working fine with TCP Segmentation Offload "tso" ON with vlan and bridge interface configured.

And to confirm this I ran some test on PROXMOX VE with "tso off" and observed a fine performance even with VLAN( 802.1q). 

So as a workaround we recommend you to turn off the "tso" on the host interface.

Please use following command to turn off the tso:
# ethtool -K <interface> tso off
With tso off i get 1.13 GBit/s between two VMs!

Is it possible to enable this inside the kernel? Or why is there a difference to the RHEL-Kernel?

Udo
 
What driver do we talk about (exactly)?
Hi Dietmar,
it's the sfc-driver:
Code:
modinfo sfc
filename:       /lib/modules/2.6.35-1-pve/kernel/drivers/net/sfc/sfc.ko
license:        GPL
description:    Solarflare Communications network driver
author:         Solarflare Communications and Michael Brown <mbrown@fensystems.co.uk>
srcversion:     B4831765C939D27E214D619
alias:          pci:v00001924d00000813sv*sd*bc*sc*i*
alias:          pci:v00001924d00000803sv*sd*bc*sc*i*
alias:          pci:v00001924d00000710sv*sd*bc*sc*i*
alias:          pci:v00001924d00000703sv*sd*bc*sc*i*
depends:        mtd,mdio,i2c-algo-bit
vermagic:       2.6.35-1-pve SMP mod_unload modversions 
parm:           rx_alloc_method:Allocation method used for RX buffers (int)
parm:           rx_refill_threshold:RX descriptor ring fast/slow fill threshold (%) (uint)
parm:           rx_xoff_thresh_bytes:RX fifo XOFF threshold (int)
parm:           rx_xon_thresh_bytes:RX fifo XON threshold (int)
parm:           separate_tx_channels:Use separate channels for TX and RX (uint)
parm:           rss_cpus:Number of CPUs to use for Receive-Side Scaling (uint)
parm:           phy_flash_cfg:Set PHYs into reflash mode initially (int)
parm:           irq_adapt_low_thresh:Threshold score for reducing IRQ moderation (uint)
parm:           irq_adapt_high_thresh:Threshold score for increasing IRQ moderation (uint)
parm:           interrupt_mode:Interrupt mode (0=>MSIX 1=>MSI 2=>legacy) (uint)
there a a much newer driver, but with the same behavior ( https://support.solarflare.com/inde...ormat=raw&id=165&option=com_cognidox&Itemid=2 )

Udo
 
Hi Dietmar,
right - the problem is with the kernel. I don't know where - i know only the workaround with tso-off.
Perhaps you know which kernel-changes (related to RHEL-Kernel) can make a different behavior in relation to tso??

I guess the problem is inside the Solaflare driver, so that question should go to Solaflare support?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!