Ingress network performance issues + host errors on HP/Broadcom gear

ucosty

New Member
Nov 3, 2010
2
0
1
I am running Proxmox 1.6 on an HP BL495c Blade. The system is running the following kernel:
Code:
Linux se-syd-vm3 2.6.24-12-pve #1 SMP PREEMPT Mon Sep 20 13:02:41 CEST 2010 x86_64 GNU/Linux

We have removed all but one virtual machine on this host. This is a single windows server 2008 virtual machine running VirtIO networking drivers. The choice of virtual network adapter makes no difference to the outcome. We have tried RTL, E1000 and VirtIO drivers with no success.

Egress network performance from the virtual machine is running a little slow for gigabit but not too horrible (~600mbit). Virtual Machine ingress performance is, on the other hand, extremely slow maxing out at 70 kilobits.

This affects all types of ingress traffic, both TCP and UDP traffic are equally affected. To test the ingress performance we are using iperf on the virtual machine in server mode. Iperf by default transfers data from the 'client' to the 'server'. From the known good linux client this is what we see:

Code:
sse-syd-backup:~# iperf -c 121.52.199.102
------------------------------------------------------------
Client connecting to 121.52.199.102, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 121.52.199.130 port 58182 connected with 121.52.199.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.9 sec  80.0 KBytes  60.3 Kbits/sec

The results on the Windows Server are mirrored. If we reverse the flow, uploading from the Windows VM to a known good linux host (running on a different physical server) we see that the network throughput is much more resonable.

Code:
se-syd-backup:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 121.52.199.130 port 5001 connected with 121.52.199.102 port 49196
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec    581 MBytes    487 Mbits/sec

The host has a pair of Broadcom NetXtreme II 10GBE Network interfaces, as shown by the snippet of lspci below:
Code:
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe

The system messages log gets absolutely hammered with messages like below:

Code:
WARNING: at net/core/dev.c:1407 skb_gso_segment()
Pid: 0, comm: swapper Not tainted 2.6.24-12-pve #1

Call Trace:
 <IRQ>  [<ffffffff8043a706>] skb_gso_segment+0x196/0x250
 [<ffffffff8043a97e>] dev_hard_start_xmit+0x1be/0x340
 [<ffffffff804506fe>] __qdisc_run+0x9e/0x2a0
 [<ffffffff8043aead>] dev_queue_xmit+0x28d/0x370
 [<ffffffff8833434e>] :bridge:br_dev_queue_push_xmit+0x6e/0xc0
 [<ffffffff883398cb>] :bridge:br_nf_post_routing+0x1ab/0x240
 [<ffffffff8045c398>] nf_iterate+0x68/0xa0
 [<ffffffff883342e0>] :bridge:br_dev_queue_push_xmit+0x0/0xc0
 [<ffffffff8045c4b2>] nf_hook_slow+0xe2/0x150
 [<ffffffff883342e0>] :bridge:br_dev_queue_push_xmit+0x0/0xc0
 [<ffffffff883343f3>] :bridge:br_forward_finish+0x53/0x70
 [<ffffffff88339cf8>] :bridge:br_nf_forward_finish+0x178/0x190
 [<ffffffff88339f2d>] :bridge:br_nf_forward_ip+0x21d/0x2a0
 [<ffffffff8045c398>] nf_iterate+0x68/0xa0
 [<ffffffff883343a0>] :bridge:br_forward_finish+0x0/0x70
 [<ffffffff8045c4b2>] nf_hook_slow+0xe2/0x150
 [<ffffffff883343a0>] :bridge:br_forward_finish+0x0/0x70
 [<ffffffff88334473>] :bridge:__br_forward+0x63/0x90
 [<ffffffff8833525e>] :bridge:br_handle_frame_finish+0x15e/0x200
 [<ffffffff8833a758>] :bridge:br_nf_pre_routing_finish+0x2e8/0x4c0
 [<ffffffff8045c398>] nf_iterate+0x68/0xa0
 [<ffffffff8833a470>] :bridge:br_nf_pre_routing_finish+0x0/0x4c0
 [<ffffffff8045c4b2>] nf_hook_slow+0xe2/0x150
 [<ffffffff8833a470>] :bridge:br_nf_pre_routing_finish+0x0/0x4c0
 [<ffffffff8833ae7a>] :bridge:br_nf_pre_routing+0x54a/0x830
 [<ffffffff8045c398>] nf_iterate+0x68/0xa0
 [<ffffffff88335100>] :bridge:br_handle_frame_finish+0x0/0x200
 [<ffffffff8045c4b2>] nf_hook_slow+0xe2/0x150
 [<ffffffff88335100>] :bridge:br_handle_frame_finish+0x0/0x200
 [<ffffffff883354b6>] :bridge:br_handle_frame+0x1b6/0x270
 [<ffffffff80439cb8>] netif_receive_skb+0x1a8/0x5e0
 [<ffffffff881b7349>] :bnx2x:bnx2x_rx_int+0x1929/0x1cf0
 [<ffffffff80236b66>] try_to_wake_up+0x66/0x490
 [<ffffffff8833a470>] :bridge:br_nf_pre_routing_finish+0x0/0x4c0
 [<ffffffff8023025f>] load_elf32_binary+0xdbf/0x2110
 [<ffffffff80427b3b>] pci_conf1_read+0xcb/0x100
 [<ffffffff80390013>] pci_bus_read_config_dword+0x83/0xb0
 [<ffffffff8038fe33>] pci_bus_write_config_dword+0x73/0xa0
 [<ffffffff881b8976>] :bnx2x:bnx2x_poll+0x76/0x200
 [<ffffffff802260c6>] dma_map_area+0x36/0x110
 [<ffffffff804384ff>] net_rx_action+0x12f/0x290
 [<ffffffff8024882a>] __do_softirq+0xba/0x150
 [<ffffffff8020d84c>] call_softirq+0x1c/0x30
 [<ffffffff8020eb55>] do_softirq+0x35/0x90
 [<ffffffff80248525>] irq_exit+0xf5/0x100
 [<ffffffff8020edb1>] do_IRQ+0x81/0x100
 [<ffffffff8020a000>] default_idle+0x0/0x50
 [<ffffffff8020a000>] default_idle+0x0/0x50
 [<ffffffff8020cba1>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8020a042>] default_idle+0x42/0x50
 [<ffffffff8020b538>] cpu_idle+0x88/0x110

We have tried the official HP/Broadcom BNX2X network drivers compiled from source with no effect.

If we migrate the virtual machine to another host the problem goes away. This leads me to believe the problem is with the host and not the virtual machine. The messages log would seem to back me up on this.

Any ideas?