Kernel dumps while doing multi hundred mbps download on a VM

ispirto

Renowned Member
Oct 20, 2012
37
1
73
Hello,

We are seeing the following warning messages a few times a minute while a VM is doing multiple hundred mbits of download per second. I'm not sure if this is bridge related or the NIC related.

Any insight?

Code:
[7960695.303208] WARNING: CPU: 1 PID: 9426 at net/core/dev.c:2422 skb_warn_bad_offload+0xd3/0x120()
[7960695.303211] ixgbe: caps=(0x00000802202043a1, 0x0000000000000000) len=1598 data_len=1544 gso_size=1460 gso_type=5 ip_summed=0
[7960695.303212] Modules linked in: ipmi_devintf binfmt_misc xt_tcpudp act_police cls_basic sch_ingress sch_htb ip_set ip6table_filter ip6_tables iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_pcm ablk_helper snd_timer cryptd snd joydev input_leds soundcore shpchp pcspkr sb_edac ioatdma edac_core mei_me ipmi_si mei 8250_fintek
[7960695.303271]  i2c_i801 lpc_ich ipmi_msghandler wmi mac_hid vhost_net vhost macvtap macvlan autofs4 btrfs xor raid6_pq ses enclosure hid_generic ixgbe(O) vxlan ip6_udp_tunnel udp_tunnel usbmouse usbkbd igb(O) usbhid ahci isci dca hid libahci ptp libsas pps_core scsi_transport_sas megaraid_sas fjes
[7960695.303300] CPU: 1 PID: 9426 Comm: vhost-9400 Tainted: P        W  O    4.4.19-1-pve #1
[7960695.303302] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
[7960695.303304]  0000000000000286 00000000ba9c4c24 ffff88181fa43948 ffffffff813f3de3
[7960695.303306]  ffff88181fa43990 ffffffff81d6782e ffff88181fa43980 ffffffff81081796
[7960695.303309]  ffff8812e3046100 ffff88300c960000 0000000000000005 ffff88300c960000
[7960695.303311] Call Trace:
[7960695.303313]  <IRQ>  [<ffffffff813f3de3>] dump_stack+0x63/0x90
[7960695.303323]  [<ffffffff81081796>] warn_slowpath_common+0x86/0xc0
[7960695.303325]  [<ffffffff8108182c>] warn_slowpath_fmt+0x5c/0x80
[7960695.303328]  [<ffffffff813f9db6>] ? ___ratelimit+0x86/0xe0
[7960695.303331]  [<ffffffff81730da3>] skb_warn_bad_offload+0xd3/0x120
[7960695.303334]  [<ffffffff8173526e>] __skb_gso_segment+0x7e/0xd0
[7960695.303336]  [<ffffffff8173560f>] validate_xmit_skb.isra.99.part.100+0x12f/0x2b0
[7960695.303338]  [<ffffffff81735bcb>] validate_xmit_skb_list+0x3b/0x60
[7960695.303342]  [<ffffffff8175b448>] sch_direct_xmit+0x138/0x220
[7960695.303344]  [<ffffffff81735f23>] __dev_queue_xmit+0x253/0x590
[7960695.303346]  [<ffffffff81736270>] dev_queue_xmit+0x10/0x20
[7960695.303350]  [<ffffffff818269d8>] br_dev_queue_push_xmit+0x88/0x150
[7960695.303353]  [<ffffffff81826ae1>] br_forward_finish+0x41/0xb0
[7960695.303355]  [<ffffffff81826950>] ? deliver_clone+0x50/0x50
[7960695.303358]  [<ffffffff81826d46>] __br_forward+0xa6/0x140
[7960695.303361]  [<ffffffff810abb07>] ? ttwu_do_wakeup+0x87/0xe0
[7960695.303363]  [<ffffffff81826aa0>] ? br_dev_queue_push_xmit+0x150/0x150
[7960695.303366]  [<ffffffff81827167>] br_forward+0x87/0x90
[7960695.303369]  [<ffffffff81828208>] br_handle_frame_finish+0x338/0x610
[7960695.303372]  [<ffffffff81222563>] ? pollwake+0x73/0x90
[7960695.303375]  [<ffffffff8176b72d>] ? nf_iterate+0x5d/0x70
[7960695.303378]  [<ffffffff8182865f>] br_handle_frame+0x17f/0x2c0
[7960695.303380]  [<ffffffff81827ed0>] ? br_handle_local_finish+0xa0/0xa0
[7960695.303383]  [<ffffffff81733430>] __netif_receive_skb_core+0x370/0xa60
[7960695.303384]  [<ffffffff810aba99>] ? ttwu_do_wakeup+0x19/0xe0
[7960695.303387]  [<ffffffff810abbfd>] ? ttwu_do_activate.constprop.89+0x5d/0x70
[7960695.303388]  [<ffffffff81733b36>] __netif_receive_skb+0x16/0x70
[7960695.303390]  [<ffffffff81734928>] process_backlog+0xa8/0x150
[7960695.303392]  [<ffffffff81734085>] net_rx_action+0x215/0x350
[7960695.303395]  [<ffffffff8108629e>] __do_softirq+0x10e/0x2a0
[7960695.303399]  [<ffffffff818561cc>] do_softirq_own_stack+0x1c/0x30
[7960695.303400]  <EOI>  [<ffffffff81085ae8>] do_softirq.part.20+0x38/0x40
[7960695.303404]  [<ffffffff8108649d>] do_softirq+0x1d/0x20
[7960695.303406]  [<ffffffff81732e13>] netif_rx_ni+0x33/0x80
[7960695.303409]  [<ffffffff816044f1>] tun_get_user+0x521/0x930
[7960695.303412]  [<ffffffff81604951>] tun_sendmsg+0x51/0x70
[7960695.303416]  [<ffffffffc00dbe40>] handle_tx+0x2f0/0x500 [vhost_net]
[7960695.303418]  [<ffffffffc00dc085>] handle_tx_kick+0x15/0x20 [vhost_net]
[7960695.303422]  [<ffffffffc010870e>] vhost_worker+0x10e/0x1b0 [vhost]
[7960695.303425]  [<ffffffffc0108600>] ? vhost_dev_reset_owner+0x50/0x50 [vhost]
[7960695.303428]  [<ffffffff810a0f4a>] kthread+0xea/0x100
[7960695.303430]  [<ffffffff810a0e60>] ? kthread_park+0x60/0x60
[7960695.303432]  [<ffffffff8185484f>] ret_from_fork+0x3f/0x70
[7960695.303434]  [<ffffffff810a0e60>] ? kthread_park+0x60/0x60
[7960695.303436] ---[ end trace 49ab5ded8d73dfe9 ]---

Code:
pveversion -v
proxmox-ve: 4.4-79 (running kernel: 4.4.19-1-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-108
pve-firmware: 1.1-10
libpve-common-perl: 4.0-91
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-73
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-93
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-1
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve14~bpo80
 
Hi,
"...bad_offload"?? Perhaps you should try disable tcp_offload for this nic?

And supermicro... look for an new bios (they don't write what fixes an bios included - crapware).

Udo

I think I did but it didn't help. What I ran was:

Code:
ethtool -K eth5 rx off
ethtool -K eth5 tx off
ethtool -K eth5 sg off
ethtool -K eth5 tso off
ethtool -K eth5 ufo off
ethtool -K eth5 gso off
ethtool -K eth5 gro off
ethtool -K eth5 lro off
ethtool -K eth5 rxvlan off
ethtool -K eth5 txvlan off
ethtool -K eth5 rxhash off

Code:
Features for eth5:
rx-checksumming: off
tx-checksumming: off
    tx-checksum-ipv4: off
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: off
    tx-checksum-fcoe-crc: on [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: off
    tx-scatter-gather: off
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
    tx-tcp-segmentation: off
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]

Also the NIC is not onboard. It's a X520-DA2 addon card. Still a BIOS update would help?

Oktay