skb_warn_bad_offload on mellanox card

past

Member
Jun 4, 2010
20
1
23
Moscow, Russia, Russia
Hello!
I'm getting kernel oops with:
02:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

```
[Ср сен 20 11:25:57 2017] WARNING: CPU: 0 PID: 14106 at net/core/dev.c:2445 skb_warn_bad_offload+0xd3/0x120()
[Ср сен 20 11:25:57 2017] mlx4_core: caps=(0x00000b0600114bb3, 0x0000000000000000) len=1490 data_len=1448 gso_size=1280 gso_type=2 ip_summed=0
[Ср сен 20 11:25:57 2017] Modules linked in: xfs rpcsec_gss_krb5 nfsv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_comment xt_set xt_addrtype xt_conntrack xt_mark ip_set_hash_net ipt_REJECT nf_reject_ipv4 xt_multi
port bcache ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c nfnetlink_log nfnetlink ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_power
clamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_pcm aes_x86_64 lrw snd_timer gf128mul ast glue_helper snd ablk_helper ttm cryptd soundcore drm_kms_helper pcspkr sb_edac edac_core drm i2c_algo_bit fb_sys_fops syscop
yarea sysfillrect
[Ср сен 20 11:25:57 2017] mei_me input_leds joydev sysimgblt lpc_ich mei ioatdma shpchp wmi ipmi_si ipmi_msghandler 8250_fintek acpi_power_meter mac_hid acpi_pad vhost_net vhost macvtap macvlan autofs4 raid10 mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbmouse
usbkbd usbhid hid igb(O) i2c_i801 ahci dca libahci ptp pps_core mlx4_core fjes
[Ср сен 20 11:25:57 2017] CPU: 0 PID: 14106 Comm: corosync Tainted: G W O 4.4.79-1-pve #1
[Ср сен 20 11:25:57 2017] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0b 05/02/2017
[Ср сен 20 11:25:57 2017] 0000000000000286 8b29cbfbd6c7602b ffff881fecd63358 ffffffff813fc033
[Ср сен 20 11:25:57 2017] ffff881fecd633a0 ffffffff81d71a66 ffff881fecd63390 ffffffff810818f6
[Ср сен 20 11:25:57 2017] ffff881fedd0d000 ffff881fde7e0000 0000000000000002 0000000000000081
[Ср сен 20 11:25:57 2017] Call Trace:
[Ср сен 20 11:25:57 2017] [<ffffffff813fc033>] dump_stack+0x63/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff810818f6>] warn_slowpath_common+0x86/0xc0
[Ср сен 20 11:25:57 2017] [<ffffffff8108198c>] warn_slowpath_fmt+0x5c/0x80
[Ср сен 20 11:25:57 2017] [<ffffffff81402086>] ? ___ratelimit+0x86/0xe0
[Ср сен 20 11:25:57 2017] [<ffffffff817420e3>] skb_warn_bad_offload+0xd3/0x120
[Ср сен 20 11:25:57 2017] [<ffffffff8174662d>] __skb_gso_segment+0xfd/0x110
[Ср сен 20 11:25:57 2017] [<ffffffff8174696f>] validate_xmit_skb.isra.99.part.100+0x12f/0x2b0
[Ср сен 20 11:25:57 2017] [<ffffffff81746b2b>] validate_xmit_skb_list+0x3b/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff8176d0d8>] sch_direct_xmit+0x138/0x220
[Ср сен 20 11:25:57 2017] [<ffffffff81747283>] __dev_queue_xmit+0x253/0x590
[Ср сен 20 11:25:57 2017] [<ffffffff817475d0>] dev_queue_xmit+0x10/0x20
[Ср сен 20 11:25:57 2017] [<ffffffffc03abc1a>] ovs_vport_send+0x4a/0xc0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039e743>] do_output.isra.28+0x43/0x170 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039f144>] ? do_execute_actions+0x734/0x1320 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039f144>] do_execute_actions+0x734/0x1320 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039fd63>] ovs_execute_actions+0x33/0xd0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03a33b4>] ovs_dp_process_packet+0x84/0x130 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03a3b12>] ? key_extract+0x262/0xc30 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03ab50c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffff811ed6ac>] ? ___slab_alloc+0x1cc/0x430
[Ср сен 20 11:25:57 2017] [<ffffffff81730499>] ? __alloc_skb+0x89/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffff8172f0b1>] ? __kmalloc_reserve.isra.32+0x31/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff8173046b>] ? __alloc_skb+0x5b/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffffc03abdf8>] internal_dev_xmit+0x28/0x60 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffff81746da8>] dev_hard_start_xmit+0x258/0x400
[Ср сен 20 11:25:57 2017] [<ffffffff8174685a>] ? validate_xmit_skb.isra.99.part.100+0x1a/0x2b0
[Ср сен 20 11:25:57 2017] [<ffffffff8174754d>] __dev_queue_xmit+0x51d/0x590
[Ср сен 20 11:25:57 2017] [<ffffffff817475d0>] dev_queue_xmit+0x10/0x20
[Ср сен 20 11:25:57 2017] [<ffffffff817874a3>] ip_finish_output2+0x283/0x350
[Ср сен 20 11:25:57 2017] [<ffffffff81788119>] ip_finish_output+0x139/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffff8177c373>] ? nf_hook_slow+0x73/0xd0
[Ср сен 20 11:25:57 2017] [<ffffffff8178979e>] ip_output+0x6e/0xe0
[Ср сен 20 11:25:57 2017] [<ffffffff81788f02>] ? __ip_local_out+0xf2/0x110
[Ср сен 20 11:25:57 2017] [<ffffffff81787fe0>] ? ip_fragment.constprop.52+0x80/0x80
[Ср сен 20 11:25:57 2017] [<ffffffff81788f55>] ip_local_out+0x35/0x40
[Ср сен 20 11:25:57 2017] [<ffffffff8178a179>] ip_send_skb+0x19/0x40
[Ср сен 20 11:25:57 2017] [<ffffffff817b15a2>] udp_send_skb+0xb2/0x2a0
[Ср сен 20 11:25:57 2017] [<ffffffff817b24c6>] udp_sendmsg+0x306/0xae0
[Ср сен 20 11:25:57 2017] [<ffffffff81786fb0>] ? ip_reply_glue_bits+0x60/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff813a25a1>] ? aa_sock_msg_perm+0x61/0x150
[Ср сен 20 11:25:57 2017] [<ffffffff817c01c5>] inet_sendmsg+0x65/0xa0
[Ср сен 20 11:25:57 2017] [<ffffffff81727bd8>] sock_sendmsg+0x38/0x50
[Ср сен 20 11:25:57 2017] [<ffffffff817287cf>] ___sys_sendmsg+0x27f/0x290
[Ср сен 20 11:25:57 2017] [<ffffffff81258df0>] ? ep_send_events_proc+0xb0/0x1c0
[Ср сен 20 11:25:57 2017] [<ffffffff8125971f>] ? ep_poll+0x20f/0x3f0
[Ср сен 20 11:25:57 2017] [<ffffffff8122da25>] ? __fget_light+0x25/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff81728f71>] __sys_sendmsg+0x51/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff81728fc2>] SyS_sendmsg+0x12/0x20
[Ср сен 20 11:25:57 2017] [<ffffffff81865eb6>] entry_SYSCALL_64_fastpath+0x16/0x75
[Ср сен 20 11:25:57 2017] ---[ end trace 6aff8f366e3b94bc ]---
```
On two my new servers

root@pve-app02e:~# pveversion --verbose
proxmox-ve: 4.4-95 (running kernel: 4.4.79-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.79-1-pve: 4.4.79-95
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-112
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
openvswitch-switch: 2.6.0-2
ceph: 9.2.1-1~bpo80+1


Any advice?
 
Hello!
I'm getting kernel oops with:
02:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

```
[Ср сен 20 11:25:57 2017] WARNING: CPU: 0 PID: 14106 at net/core/dev.c:2445 skb_warn_bad_offload+0xd3/0x120()
[Ср сен 20 11:25:57 2017] mlx4_core: caps=(0x00000b0600114bb3, 0x0000000000000000) len=1490 data_len=1448 gso_size=1280 gso_type=2 ip_summed=0
[Ср сен 20 11:25:57 2017] Modules linked in: xfs rpcsec_gss_krb5 nfsv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_comment xt_set xt_addrtype xt_conntrack xt_mark ip_set_hash_net ipt_REJECT nf_reject_ipv4 xt_multi
port bcache ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c nfnetlink_log nfnetlink ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_power
clamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_pcm aes_x86_64 lrw snd_timer gf128mul ast glue_helper snd ablk_helper ttm cryptd soundcore drm_kms_helper pcspkr sb_edac edac_core drm i2c_algo_bit fb_sys_fops syscop
yarea sysfillrect
[Ср сен 20 11:25:57 2017] mei_me input_leds joydev sysimgblt lpc_ich mei ioatdma shpchp wmi ipmi_si ipmi_msghandler 8250_fintek acpi_power_meter mac_hid acpi_pad vhost_net vhost macvtap macvlan autofs4 raid10 mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbmouse
usbkbd usbhid hid igb(O) i2c_i801 ahci dca libahci ptp pps_core mlx4_core fjes
[Ср сен 20 11:25:57 2017] CPU: 0 PID: 14106 Comm: corosync Tainted: G W O 4.4.79-1-pve #1
[Ср сен 20 11:25:57 2017] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0b 05/02/2017
[Ср сен 20 11:25:57 2017] 0000000000000286 8b29cbfbd6c7602b ffff881fecd63358 ffffffff813fc033
[Ср сен 20 11:25:57 2017] ffff881fecd633a0 ffffffff81d71a66 ffff881fecd63390 ffffffff810818f6
[Ср сен 20 11:25:57 2017] ffff881fedd0d000 ffff881fde7e0000 0000000000000002 0000000000000081
[Ср сен 20 11:25:57 2017] Call Trace:
[Ср сен 20 11:25:57 2017] [<ffffffff813fc033>] dump_stack+0x63/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff810818f6>] warn_slowpath_common+0x86/0xc0
[Ср сен 20 11:25:57 2017] [<ffffffff8108198c>] warn_slowpath_fmt+0x5c/0x80
[Ср сен 20 11:25:57 2017] [<ffffffff81402086>] ? ___ratelimit+0x86/0xe0
[Ср сен 20 11:25:57 2017] [<ffffffff817420e3>] skb_warn_bad_offload+0xd3/0x120
[Ср сен 20 11:25:57 2017] [<ffffffff8174662d>] __skb_gso_segment+0xfd/0x110
[Ср сен 20 11:25:57 2017] [<ffffffff8174696f>] validate_xmit_skb.isra.99.part.100+0x12f/0x2b0
[Ср сен 20 11:25:57 2017] [<ffffffff81746b2b>] validate_xmit_skb_list+0x3b/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff8176d0d8>] sch_direct_xmit+0x138/0x220
[Ср сен 20 11:25:57 2017] [<ffffffff81747283>] __dev_queue_xmit+0x253/0x590
[Ср сен 20 11:25:57 2017] [<ffffffff817475d0>] dev_queue_xmit+0x10/0x20
[Ср сен 20 11:25:57 2017] [<ffffffffc03abc1a>] ovs_vport_send+0x4a/0xc0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039e743>] do_output.isra.28+0x43/0x170 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039f144>] ? do_execute_actions+0x734/0x1320 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039f144>] do_execute_actions+0x734/0x1320 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc039fd63>] ovs_execute_actions+0x33/0xd0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03a33b4>] ovs_dp_process_packet+0x84/0x130 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03a3b12>] ? key_extract+0x262/0xc30 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffffc03ab50c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffff811ed6ac>] ? ___slab_alloc+0x1cc/0x430
[Ср сен 20 11:25:57 2017] [<ffffffff81730499>] ? __alloc_skb+0x89/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffff8172f0b1>] ? __kmalloc_reserve.isra.32+0x31/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff8173046b>] ? __alloc_skb+0x5b/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffffc03abdf8>] internal_dev_xmit+0x28/0x60 [openvswitch]
[Ср сен 20 11:25:57 2017] [<ffffffff81746da8>] dev_hard_start_xmit+0x258/0x400
[Ср сен 20 11:25:57 2017] [<ffffffff8174685a>] ? validate_xmit_skb.isra.99.part.100+0x1a/0x2b0
[Ср сен 20 11:25:57 2017] [<ffffffff8174754d>] __dev_queue_xmit+0x51d/0x590
[Ср сен 20 11:25:57 2017] [<ffffffff817475d0>] dev_queue_xmit+0x10/0x20
[Ср сен 20 11:25:57 2017] [<ffffffff817874a3>] ip_finish_output2+0x283/0x350
[Ср сен 20 11:25:57 2017] [<ffffffff81788119>] ip_finish_output+0x139/0x1f0
[Ср сен 20 11:25:57 2017] [<ffffffff8177c373>] ? nf_hook_slow+0x73/0xd0
[Ср сен 20 11:25:57 2017] [<ffffffff8178979e>] ip_output+0x6e/0xe0
[Ср сен 20 11:25:57 2017] [<ffffffff81788f02>] ? __ip_local_out+0xf2/0x110
[Ср сен 20 11:25:57 2017] [<ffffffff81787fe0>] ? ip_fragment.constprop.52+0x80/0x80
[Ср сен 20 11:25:57 2017] [<ffffffff81788f55>] ip_local_out+0x35/0x40
[Ср сен 20 11:25:57 2017] [<ffffffff8178a179>] ip_send_skb+0x19/0x40
[Ср сен 20 11:25:57 2017] [<ffffffff817b15a2>] udp_send_skb+0xb2/0x2a0
[Ср сен 20 11:25:57 2017] [<ffffffff817b24c6>] udp_sendmsg+0x306/0xae0
[Ср сен 20 11:25:57 2017] [<ffffffff81786fb0>] ? ip_reply_glue_bits+0x60/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff813a25a1>] ? aa_sock_msg_perm+0x61/0x150
[Ср сен 20 11:25:57 2017] [<ffffffff817c01c5>] inet_sendmsg+0x65/0xa0
[Ср сен 20 11:25:57 2017] [<ffffffff81727bd8>] sock_sendmsg+0x38/0x50
[Ср сен 20 11:25:57 2017] [<ffffffff817287cf>] ___sys_sendmsg+0x27f/0x290
[Ср сен 20 11:25:57 2017] [<ffffffff81258df0>] ? ep_send_events_proc+0xb0/0x1c0
[Ср сен 20 11:25:57 2017] [<ffffffff8125971f>] ? ep_poll+0x20f/0x3f0
[Ср сен 20 11:25:57 2017] [<ffffffff8122da25>] ? __fget_light+0x25/0x60
[Ср сен 20 11:25:57 2017] [<ffffffff81728f71>] __sys_sendmsg+0x51/0x90
[Ср сен 20 11:25:57 2017] [<ffffffff81728fc2>] SyS_sendmsg+0x12/0x20
[Ср сен 20 11:25:57 2017] [<ffffffff81865eb6>] entry_SYSCALL_64_fastpath+0x16/0x75
[Ср сен 20 11:25:57 2017] ---[ end trace 6aff8f366e3b94bc ]---
```
On two my new servers

root@pve-app02e:~# pveversion --verbose
proxmox-ve: 4.4-95 (running kernel: 4.4.79-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.79-1-pve: 4.4.79-95
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-112
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
openvswitch-switch: 2.6.0-2
ceph: 9.2.1-1~bpo80+1


Any advice?


try switching off any of these: tso, gro, gso, and lro
example:
Code:
 ethtool -K <ethx> lro off
 
Thanks!

root@pve-app02e:~# ethtool -k eth2 | grep offload
tcp-segmentation-offload: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off
tx-vlan-offload: off
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]

Looks like all offloads are off, but i'm still getting oop evey ~10 seconds.
 
Thanks!

root@pve-app02e:~# ethtool -k eth2 | grep offload
tcp-segmentation-offload: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off
tx-vlan-offload: off
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]

Looks like all offloads are off, but i'm still getting oop evey ~10 seconds.
might be a issue with flow control on card and on switch ... please check these settings
i have no problems on pve5 and stretch and x3-pro running on 56Mbit and mellanox switch ... see my signature ..
I updated firmware of x3-pro to latest and all things are smooth :)
 
my firmware version:
Code:
 mstflint -d 81:00.0 q
Image type:          FS2
FW Version:          2.40.7000
FW Release Date:     22.3.2017
Product Version:     02.40.70.00
Rom Info:            type=PXE version=3.4.746 devid=4103
Device ID:           4103
Description:         Node             Port1            Port2            Sys image
GUIDs:               ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
MACs:                                     248a07e26070     248a07e26071
VSD:
PSID:                MT_1090111023
 
consider upgrade to pve5! also bluestore in ceph is exiting !
btw i operate port1 with public network connectet to sx1012 switch and ceph on port2 on same switch but these ports are tagged as vlan on switch... no need to have vlan on port2 for ceph ... I found multicast might be a problem if public and ceph network are not isolated on switch ...
clustermanager will interfere witch ceph broad/multicasts if not isolated, even you define distinct ip4 address space for each ....
 
should be fixed in the 4.4.83-1-pve kernel, currently available on pvetest. note that the messages should be harmless (those are not OOPS/BUGS, but WARNINGS ;)), but annoying since there are so many of them.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!