KVM and multi queue NICs

stefws · May 2, 2016

Got a Check Point firewall VM, which at relative high work load max'es it's vCore 0 at +95% kernel land usage, so am looking to possible turn on multi queued NICs.

Only wondering if we need to do something on the KVM backend for this to work. Hints appreciated!

TIA

stefws · May 2, 2016

overlooked that it's a configurable option on the HW, will try this.

stefws · May 3, 2016

Hm partly seems to work, activated MQ on a CentOS 6.7 VM 401:

May 3 09:16:16 n1 pvedaemon[25905]: <root@pam> update VM 401: -net0 virtio=0A3:6FF:9D:EE,bridge=vmbr1,tag=40,queues=8
May 3 09:16:20 n1 pvedaemon[24524]: <root@pam> starting task UPID:n1:0000672F:026CC5AF:57285044:qmstart:401:root@pam:
May 3 09:16:21 n1 pvedaemon[24524]: <root@pam> starting task UPID:n1:00006735:026CC637:57285045:vncproxy:401:root@pam:
May 3 09:16:25 n1 kernel: [406811.890326] device tap401i0 entered promiscuous mode
May 3 09:16:25 n1 kernel: [406812.223486] device tap401i1 entered promiscuous mode
May 3 09:16:26 n1 kernel: [406812.564474] device tap401i2 entered promiscuous mode
May 3 09:16:26 n1 kernel: [406812.901682] device tap401i3 entered promiscuous mode
May 3 09:16:26 n1 kernel: [406813.267826] device tap401i4 entered promiscuous mode
May 3 09:16:27 n1 kernel: [406813.637096] device tap401i5 entered promiscuous mode

[root@hapB ~]# uname -r
4.5.1-1.el6.elrepo.x86_64
# following worked on all NICs
[root@hapB ~]# ethtool -L eth0 combined 4
[root@hapB ~]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 8
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 4

But with a GAiA CheckPoint FW VM 500 (running an older kernel 2.6.18-92cpx86_64) this fails:

[Expert@fw01b:0]# cpmq set rx_num ixgbe 4 -f
There are no multi-queue supported interfaces in the computer
[Expert@fw01b:0]# cpmq get -a
There are no multi-queue supported interfaces in the computer

and I see following in the hypervisor log from time to time:

May 3 09:16:44 n1 kernel: [406830.761041] ------------[ cut here ]------------
May 3 09:16:44 n1 kernel: [406830.761049] WARNING: CPU: 1 PID: 48162 at net/core/dev.c:2422 skb_warn_bad_offload+0xd3/0x120()
May 3 09:16:44 n1 kernel: [406830.761051] tap205i1: caps=(0x00000080001b48c9, 0x0000000000000000) len=2323 data_len=0 gso_size=1448 gso_type=5 ip_summed=0
May 3 09:16:44 n1 kernel: [406830.761052] Modules linked in: rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables scsi_dh_alua dm_round_robin softdog iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c nfnetlink_log nfnetlink dm_multipath nls_iso8859_1 zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_pcm aesni_intel snd_timer aes_x86_64 snd lrw soundcore gf128mul sb_edac pcspkr joydev input_leds glue_helper hpilo i2c_i801 edac_core ioatdma ablk_helper cryptd lpc_ich shpchp ipmi_si ipmi_msghandler vhost_net vhost wmi macvtap 8250_fintek mac_hid macvlan acpi_power_meter bonding 8021q garp mrp autofs4 hid_generic usbkbd usbmouse usbhid ixgbe(O) hid dca vxlan ip6_udp_tunnel tg3 udp_tunnel ptp hpsa pps_core scsi_transport_sas fjes
May 3 09:16:44 n1 kernel: [406830.761110] CPU: 1 PID: 48162 Comm: vhost-48097 Tainted: P W O 4.4.6-1-pve #1
May 3 09:16:44 n1 kernel: [406830.761111] Hardware name: HP ProLiant DL360 Gen9, BIOS P89 11/10/2015
May 3 09:16:44 n1 kernel: [406830.761112] 0000000000000286 0000000088742a26 ffff88103f843858 ffffffff813eb0e3
May 3 09:16:44 n1 kernel: [406830.761114] ffff88103f8438a0 ffffffff81d62906 ffff88103f843890 ffffffff81081516
May 3 09:16:44 n1 kernel: [406830.761115] ffff8812525d7900 ffff880a7fa44000 0000000000000005 ffff8812525d7900
May 3 09:16:44 n1 kernel: [406830.761118] Call Trace:
May 3 09:16:44 n1 kernel: [406830.761119] <IRQ> [<ffffffff813eb0e3>] dump_stack+0x63/0x90
May 3 09:16:44 n1 kernel: [406830.761127] [<ffffffff81081516>] warn_slowpath_common+0x86/0xc0
May 3 09:16:44 n1 kernel: [406830.761129] [<ffffffff810815ac>] warn_slowpath_fmt+0x5c/0x80
May 3 09:16:44 n1 kernel: [406830.761131] [<ffffffff813f10b6>] ? ___ratelimit+0x86/0xe0
May 3 09:16:44 n1 kernel: [406830.761133] [<ffffffff81725553>] skb_warn_bad_offload+0xd3/0x120
May 3 09:16:44 n1 kernel: [406830.761135] [<ffffffff81729a1e>] __skb_gso_segment+0x7e/0xd0
May 3 09:16:44 n1 kernel: [406830.761137] [<ffffffff81729dbf>] validate_xmit_skb.isra.99.part.100+0x12f/0x2b0
May 3 09:16:44 n1 kernel: [406830.761139] [<ffffffff8172a9f9>] __dev_queue_xmit+0x579/0x590
May 3 09:16:44 n1 kernel: [406830.761141] [<ffffffff817125ba>] ? kfree_skbmem+0x5a/0x60
May 3 09:16:44 n1 kernel: [406830.761143] [<ffffffff8172aa20>] dev_queue_xmit+0x10/0x20
May 3 09:16:44 n1 kernel: [406830.761149] [<ffffffffc03a5b7a>] ovs_vport_send+0x4a/0xc0 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761151] [<ffffffffc0398743>] do_output.isra.28+0x43/0x170 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761154] [<ffffffffc0399154>] do_execute_actions+0x734/0x1320 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761156] [<ffffffffc0399d73>] ovs_execute_actions+0x33/0xd0 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761158] [<ffffffffc039d3c4>] ovs_dp_process_packet+0x84/0x130 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761161] [<ffffffffc039e212>] ? key_extract+0x952/0xc30 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761164] [<ffffffffc03a546c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761167] [<ffffffff810bd1a7>] ? find_busiest_group+0x47/0x4e0
May 3 09:16:44 n1 kernel: [406830.761169] [<ffffffff810b2c6c>] ? __enqueue_entity+0x6c/0x70
May 3 09:16:44 n1 kernel: [406830.761171] [<ffffffff810b965c>] ? enqueue_entity+0x36c/0xc20
May 3 09:16:44 n1 kernel: [406830.761174] [<ffffffff813ffe55>] ? find_next_bit+0x15/0x20
May 3 09:16:44 n1 kernel: [406830.761176] [<ffffffff810b370a>] ? select_idle_sibling+0x2a/0x120
May 3 09:16:44 n1 kernel: [406830.761178] [<ffffffff810ab6f9>] ? ttwu_do_wakeup+0x19/0xe0
May 3 09:16:44 n1 kernel: [406830.761180] [<ffffffffc03a66c2>] netdev_frame_hook+0x122/0x160 [openvswitch]
May 3 09:16:44 n1 kernel: [406830.761182] [<ffffffff81727be0>] __netif_receive_skb_core+0x370/0xa60
May 3 09:16:44 n1 kernel: [406830.761184] [<ffffffff810ac600>] ? try_to_wake_up+0x3b0/0x400
May 3 09:16:44 n1 kernel: [406830.761186] [<ffffffff810c3d42>] ? autoremove_wake_function+0x12/0x40
May 3 09:16:44 n1 kernel: [406830.761188] [<ffffffff810c36b2>] ? __wake_up_common+0x52/0x90
May 3 09:16:44 n1 kernel: [406830.761190] [<ffffffff817282e6>] __netif_receive_skb+0x16/0x70
May 3 09:16:44 n1 kernel: [406830.761192] [<ffffffff817290d8>] process_backlog+0xa8/0x150
May 3 09:16:44 n1 kernel: [406830.761193] [<ffffffff81728835>] net_rx_action+0x215/0x350
May 3 09:16:44 n1 kernel: [406830.761196] [<ffffffff8108602e>] __do_softirq+0x10e/0x2a0
May 3 09:16:44 n1 kernel: [406830.761199] [<ffffffff8184940c>] do_softirq_own_stack+0x1c/0x30
May 3 09:16:44 n1 kernel: [406830.761199] <EOI> [<ffffffff81085878>] do_softirq.part.20+0x38/0x40
May 3 09:16:44 n1 kernel: [406830.761203] [<ffffffff8108622d>] do_softirq+0x1d/0x20
May 3 09:16:44 n1 kernel: [406830.761204] [<ffffffff817275c3>] netif_rx_ni+0x33/0x80
May 3 09:16:44 n1 kernel: [406830.761208] [<ffffffff815fb361>] tun_get_user+0x521/0x930
May 3 09:16:44 n1 kernel: [406830.761209] [<ffffffff810b795e>] ? dequeue_entity+0x40e/0x9e0
May 3 09:16:44 n1 kernel: [406830.761210] [<ffffffff815fb7c1>] tun_sendmsg+0x51/0x70
May 3 09:16:44 n1 kernel: [406830.761213] [<ffffffffc0104e40>] handle_tx+0x2f0/0x500 [vhost_net]
May 3 09:16:44 n1 kernel: [406830.761215] [<ffffffffc0105085>] handle_tx_kick+0x15/0x20 [vhost_net]
May 3 09:16:44 n1 kernel: [406830.761218] [<ffffffffc00f670e>] vhost_worker+0x10e/0x1b0 [vhost]
May 3 09:16:44 n1 kernel: [406830.761220] [<ffffffffc00f6600>] ? vhost_dev_reset_owner+0x50/0x50 [vhost]
May 3 09:16:44 n1 kernel: [406830.761222] [<ffffffff810a0baa>] kthread+0xea/0x100
May 3 09:16:44 n1 kernel: [406830.761223] [<ffffffff810a0ac0>] ? kthread_park+0x60/0x60
May 3 09:16:44 n1 kernel: [406830.761226] [<ffffffff81847a8f>] ret_from_fork+0x3f/0x70
May 3 09:16:44 n1 kernel: [406830.761227] [<ffffffff810a0ac0>] ? kthread_park+0x60/0x60
May 3 09:16:44 n1 kernel: [406830.761231] ---[ end trace 608d52c5378a4111 ]---
May 3 09:16:45 n1 kernel: [406831.688441] ------------[ cut here ]------------

root 48097 1 35 May02 ? 03:30:53 /usr/bin/kvm -id 500 -chardev so
215-9825-87cac7b330d4 -name fwA -smp 8,sockets=2,cores=4,maxcpus=8 -nodefaults -
6 -k da -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci
ator-name=iqn.1993-08.org.debian:01:cb6448d09d7a -drive if=none,id=drive-ide2,me
-500-disk-1,if=none,id=drive-sata0,cache=writeback,format=raw,aio=threads,detect
var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BE:54:0F:
n,vhost=on,queues=4 -device virtio-net-pci,mac=DA:E1:70:A6:67:A2,netdev=net1,bus
=on,queues=4 -device virtio-net-pci,mac=E6:28:052:6E:E2,netdev=net2,bus=pci.0,
root 48120 2 0 May02 ? 00:01:25 [vhost-48097]
root 48139 2 0 May02 ? 00:00:24 [vhost-48097]
root 48140 2 0 May02 ? 00:00:00 [vhost-48097]
root 48141 2 0 May02 ? 00:00:00 [vhost-48097]
root 48142 2 0 May02 ? 00:00:00 [vhost-48097]
root 48162 2 0 May02 ? 00:00:31 [vhost-48097]
root 48163 2 0 May02 ? 00:00:00 [vhost-48097]
root 48164 2 0 May02 ? 00:00:00 [vhost-48097]
root 48165 2 0 May02 ? 00:00:00 [vhost-48097]
root 48174 2 0 May02 ? 00:00:00 [kvm-pit/48097]
[/QOUTE]

Assume this is because VM 500 isn't servicing it's MQ and thus hypervisor/OVS complains that a queue is serviced slow or not at all or what?

davlaw · Nov 10, 2016

I have a home lab that has been really stable for the past couple of years. And its running broadcom nics with the tg3 driver. I always considered myself lucky it never gave me much grief unlike the the cluster I have at work that constantly was an issue and put in Intel nics. Nice and stable now

Recently I have started using the iscsi targets for my VMs and now my home lab has become very unstable (with broadcom tg3). Usually running for less than 24 hours with some sort of error. So I'm thinking the tg3 driver can't handle iscsi traffic?

So I'm back to look around, and guessing I'm facing the same issue again.

ESX faced same issue I think

https://kb.vmware.com/selfservice/m...nguage=en_US&cmd=displayKC&externalId=2035701

Sorry just thinking , wanted to add my thoughts, and wondering if you resolved your issue?

stefws · Nov 10, 2016

Believe your ESX link are talking of using the tg3 driver in the hypervisor node not in a VM.

Anyway, are you talking of using iSCSI from with inside a VM or as VM underlying shared storage from your hypervisor nodes?

We've dropped the GAIA FW and another big name FW, as niether could do multi-queued NIC in their old linux kernels from inside a VM. Now we're using PVE built in iptables, performances so much better as it's distributed across all hypervisor nodes.

We're using multi-queued NICs inside some load balancer VMs (as this centralizes high network bandwith through single VM) running CentOS 6.8 + epel kernel-ml (currently at 4.8.6) and virtio vNICs without issues.

davlaw · Nov 10, 2016

stefws said:
Believe your ESX link are talking of using the tg3 driver in the hypervisor node not in a VM.

Anyway, are you talking of using iSCSI from with inside a VM or as VM underlying shared storage from your hypervisor nodes?

We've dropped the GAIA FW and another big name FW, as niether could do multi-queued NIC in their old linux kernels from inside a VM. Now we're using PVE built in iptables, performances so much better as it's distributed across all hypervisor nodes.

We're using multi-queued NICs inside some load balancer VMs (as this centralizes high network bandwith through single VM) running CentOS 6.8 + epel kernel-ml (currently at 4.8.6) and virtio vNICs without issues.

I think I read your statement correctly,

On the proxmox node, using broadcom nics, tg3 driver. Each vm uses virtio , but each vm itself resides on an iscsi/lvm proxmox share.

Sorry did not mean to hyjack the thread, but I have known about the issues surrounding the tg3 driver and always considered myself lucky not to have issues with my home lab. But recently starting using iscsi shares on it. Reason for the peeked interest again. Not many options for me since it is an old Dell SC1435 with one slot that is currently occupied with a SAS card. I guess I might have to go back to the NFS route and see if it clears up and purge the iscsi

stefws · Nov 10, 2016

Sometimes you end in a catch-22 state also with older HW

I'll rather go more stable than performant if it's a choice.

mir · Nov 10, 2016

stefws said:
Hm partly seems to work, activated MQ on a CentOS 6.7 VM 401:

Any benchmarks you can share?
Is this only useful for routers, firewalls etc or could normal servers benefits as well from this?

stefws · Nov 10, 2016

mir said:
Any benchmarks you can share?
Is this only useful for routers, firewalls etc or could normal servers benefits as well from this?

Not really got any benchmarks, but multi queued NIC(s) would be useful anytime on any [linux] OS instance, where you'll want to be able to process more packets/sec from NIC(s) than a single cpu core can process, and this is more offen the case for central network boxes like routers, FWs, load balancers eta.

mir · Nov 10, 2016

stefws said:
Not really got any benchmarks, but multi queued NIC(s) would be useful anytime on any [linux] OS instance, where you'll want to be able to process more packets/sec from NIC(s) than a single cpu core can process, and this is more offen the case for central network boxes like routers, FWs, load balancers eta.

I seem to remember reading some indepth tests which concluded that to not interfere with other OS related stuff it was important to exclude one cpu core from multi queue. Can you confirm this?

stefws · Nov 10, 2016

I assume this is true as far as you don't want to enable all your CPU cores to be DoS'ed by outside generated packets alone (if your pipe is bigger than the number of cores can handle) but thus leave at least one core to be able to handle other stuff like manage a ssh cnx/cli for you self

Search

Search

KVM and multi queue NICs

stefws

Renowned Member

stefws

Renowned Member

stefws

Renowned Member

davlaw

Renowned Member

stefws

Renowned Member

davlaw

Renowned Member

stefws

Renowned Member

mir

Famous Member

stefws

Renowned Member

mir

Famous Member

stefws

Renowned Member

We value your privacy