Best practice for complex network stack?

jdw · Aug 22, 2013

Suppose there is a Proxmox server with 4 network interfaces (eth1 - eth4) in a bonded LACP (bond0) configuration and 9 virtual machines evenly spread across three VLANs (1 - 3).

What manner of network configuration will give the best performance and functionality?

Currently it is configured in /etc/network interfaces like so:

Code:

iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual

auto bond0
iface bond0 inet manual
    slaves eth0 eth1 eth2 eth3
    bond_miimon 100
    bond_mode 802.3ad

auto vlan1
iface vlan1 inet manual
    vlan_raw_device bond0

auto vlan2
iface vlan2 inet manual
    vlan_raw_device bond0

auto vlan3
iface vlan3 inet manual
    vlan_raw_device bond0

auto vmbr1
iface vmbr1 inet manual
    bridge_ports vlan1
    bridge_stp off
    bridge_fd 0

auto vmbr2
iface vmbr2 inet manual
    bridge_ports vlan2
    bridge_stp off
    bridge_fd 0

auto vmbr3
iface vmbr3 inet manual
    bridge_ports vlan3
    bridge_stp off
    bridge_fd 0

For the virtual machines, we use KVM paraviritualized adapters.

So for a packet to go from the wire to the virtual machine, it must do:

eth0 -> bond0 -> vlan1 -> vmbr1 -> tap123i0 -> pv-eth0.

This works pretty well, which given the length of the chain is almost surprising. But is this really the best way to do this for high performance, or have we unnecessarily complicated it?

An effort to create a single vmbr0 on top of bond0 and use the VLAN settings in KVM (bridge=vmbr0,tag=1) seemed like it would knock one link out of the chain, but it was not successful, leading to errors in the pve-bridge startup script:

Code:

Cannot find device "bond0.1"
can't up interface bond0.1
/var/lib/qemu-server/pve-bridge: could not launch network script
kvm: -netdev type=tap,id=net0,ifname=tap123i0,script=/var/lib/qemu-server/pve-bridge,vhost=on: Device 'tap' could not be initialized

Based on the effort to create a new VLAN interface, it seems that maybe that wouldn't really have simplified things even if it had worked.

The big problem we get is that if the guest OS enables TSO, it spams the logs on Proxmox with the following:

Code:

Aug 22 07:13:11 v8 kernel: ------------[ cut here ]------------
Aug 22 07:13:11 v8 kernel: WARNING: at net/core/dev.c:1758 skb_gso_segment+0x220/0x310() (Tainted: G        W  ---------------   )
Aug 22 07:13:11 v8 kernel: Hardware name: X8DTT-H
Aug 22 07:13:11 v8 kernel: 802.1Q VLAN Support: caps=(0x110829, 0x0) len=2984 data_len=0 ip_summed=0
Aug 22 07:13:11 v8 kernel: Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp vhost_net tun macvtap ipt_REJECT macvlan kvm_intel ip_tables kvm dlm configfs fuse vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc bonding 8021q garp ipv6 snd_pcsp snd_pcm snd_page_alloc snd_timer snd iTCO_wdt i2c_i801 serio_raw soundcore i7core_edac ioatdma i2c_core iTCO_vendor_support shpchp edac_core dca ext3 mbcache jbd sg ahci e1000e [last unloaded: scsi_wait_scan]
Aug 22 07:13:11 v8 kernel: Pid: 3524, comm: vhost-3511 veid: 0 Tainted: G        W  ---------------    2.6.32-23-pve #1
Aug 22 07:13:11 v8 kernel: Call Trace:
Aug 22 07:13:11 v8 kernel: <IRQ>  [<ffffffff8106f667>] ? warn_slowpath_common+0x87/0xe0
Aug 22 07:13:11 v8 kernel: [<ffffffff8106f776>] ? warn_slowpath_fmt+0x46/0x50
Aug 22 07:13:11 v8 kernel: [<ffffffff814694f0>] ? skb_gso_segment+0x220/0x310
Aug 22 07:13:11 v8 kernel: [<ffffffffa01dbaeb>] ? bond_start_xmit+0xbb/0x5d0 [bonding]
Aug 22 07:13:11 v8 kernel: [<ffffffff8146bc99>] ? dev_hard_start_xmit+0x1a9/0x610
Aug 22 07:13:11 v8 kernel: [<ffffffff81505cc0>] ? br_dev_queue_push_xmit+0x0/0xc0
Aug 22 07:13:11 v8 kernel: [<ffffffff8146c408>] ? dev_queue_xmit+0x308/0x530
Aug 22 07:13:11 v8 kernel: [<ffffffff81505d20>] ? br_dev_queue_push_xmit+0x60/0xc0
Aug 22 07:13:11 v8 kernel: [<ffffffff81505dd8>] ? br_forward_finish+0x58/0x60
Aug 22 07:13:11 v8 kernel: [<ffffffff815060db>] ? __br_forward+0xab/0xd0
Aug 22 07:13:11 v8 kernel: [<ffffffff81497aac>] ? nf_hook_slow+0xac/0x120
Aug 22 07:13:11 v8 kernel: [<ffffffff81506fd0>] ? br_handle_frame_finish+0x0/0x2f0
Aug 22 07:13:11 v8 kernel: [<ffffffff8150626d>] ? br_forward+0x5d/0x70
Aug 22 07:13:11 v8 kernel: [<ffffffff815071c7>] ? br_handle_frame_finish+0x1f7/0x2f0
Aug 22 07:13:11 v8 kernel: [<ffffffff8150746a>] ? br_handle_frame+0x1aa/0x250
Aug 22 07:13:11 v8 kernel: [<ffffffff8146c934>] ? __netif_receive_skb+0x244/0x750
Aug 22 07:13:11 v8 kernel: [<ffffffff8146ced4>] ? process_backlog+0x94/0xf0
Aug 22 07:13:11 v8 kernel: [<ffffffff8146d901>] ? net_rx_action+0x1a1/0x3b0
Aug 22 07:13:11 v8 kernel: [<ffffffff810793cb>] ? __do_softirq+0x11b/0x260
Aug 22 07:13:11 v8 kernel: [<ffffffff8100c32c>] ? call_softirq+0x1c/0x30
Aug 22 07:13:11 v8 kernel: <EOI>  [<ffffffff8100de95>] ? do_softirq+0x75/0xb0
Aug 22 07:13:11 v8 kernel: [<ffffffff8146b358>] ? netif_rx_ni+0x28/0x30
Aug 22 07:13:11 v8 kernel: [<ffffffffa051ac1f>] ? tun_sendmsg+0x29f/0x4d0 [tun]
Aug 22 07:13:11 v8 kernel: [<ffffffffa0524bd1>] ? handle_tx+0x241/0x5e0 [vhost_net]
Aug 22 07:13:11 v8 kernel: [<ffffffffa0524fa5>] ? handle_tx_kick+0x15/0x20 [vhost_net]
Aug 22 07:13:11 v8 kernel: [<ffffffffa0521815>] ? vhost_worker+0xb5/0x130 [vhost_net]
Aug 22 07:13:11 v8 kernel: [<ffffffffa0521760>] ? vhost_worker+0x0/0x130 [vhost_net]
Aug 22 07:13:11 v8 kernel: [<ffffffff8109adb8>] ? kthread+0x88/0x90
Aug 22 07:13:11 v8 kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Aug 22 07:13:11 v8 kernel: [<ffffffff8100c22a>] ? child_rip+0xa/0x20
Aug 22 07:13:11 v8 kernel: [<ffffffff8109ad30>] ? kthread+0x0/0x90
Aug 22 07:13:11 v8 kernel: [<ffffffff8100c220>] ? child_rip+0x0/0x20
Aug 22 07:13:11 v8 kernel: ---[ end trace 3e1c0a479ee9f54c ]---

All that shows up once per TSO'd packet in each of three log files, so they cripple the throughput unless TSO is disabled on the guest OS. Also they fill the hard drive in no time flat.

Is there a better way to configure the network? Preferably one where TSO works.

Thanks for any advice!

jdw · Aug 23, 2013

As we are deciding whether to get community or basic licenses, it would be helpful to know: is this issue ("we want TSO to work in our VMs") an example of something we could open as a ticket to help us get it working?

dietmar · Aug 23, 2013

jdw said:
is this issue ("we want TSO to work in our VMs") an example of something we could open as a ticket to help us get it working?

Sure.

Search

Search

Best practice for complex network stack?

jdw

Renowned Member

jdw

Renowned Member

dietmar

Proxmox Staff Member

We value your privacy