PVE6 bnx2x vlan-aware failling

czechsys

Renowned Member
Nov 18, 2015
383
36
93
Hi,

i have HP DL380P G8 with 2x HP 10G module. When trying vlan-aware network setup as in documentation (vmbr0 without ip, vmbr0.vlan with ip) modul is failling

Jul 31 10:41:19 pve-01 kernel: [ 5.186699] bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
Jul 31 10:41:19 pve-01 kernel: [ 5.194047] bnx2x 0000:03:00.0: msix capability found
Jul 31 10:41:19 pve-01 kernel: [ 5.194493] bnx2x 0000:03:00.0: part number 394D4342-31383735-31543030-47303030
Jul 31 10:41:19 pve-01 kernel: [ 5.316834] bnx2x 0000:03:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
Jul 31 10:41:19 pve-01 kernel: [ 5.319361] bnx2x 0000:03:00.1: msix capability found
Jul 31 10:41:19 pve-01 kernel: [ 5.319770] bnx2x 0000:03:00.1: part number 394D4342-31383735-31543030-47303030
Jul 31 10:41:19 pve-01 kernel: [ 5.424835] bnx2x 0000:03:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
Jul 31 10:41:19 pve-01 kernel: [ 5.427437] bnx2x 0000:03:00.0 eno1: renamed from eth0
Jul 31 10:41:19 pve-01 kernel: [ 5.462787] bnx2x 0000:03:00.1 eno2: renamed from eth1
Jul 31 10:41:19 pve-01 kernel: [ 10.750818] bnx2x 0000:03:00.0 eno1: using MSI-X IRQs: sp 160 fp[0] 162 ... fp[7] 169
Jul 31 10:41:19 pve-01 kernel: [ 10.934332] bnx2x 0000:03:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
Jul 31 10:41:20 pve-01 kernel: [ 11.668620] bnx2x: [bnx2x_attn_int_deasserted3:4338(eno1)]MC assert!
Jul 31 10:41:20 pve-01 kernel: [ 11.668696] bnx2x: [bnx2x_mc_assert:720(eno1)]XSTORM_ASSERT_LIST_INDEX 0x2
Jul 31 10:41:20 pve-01 kernel: [ 11.668750] bnx2x: [bnx2x_mc_assert:736(eno1)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x00000100 0x00020017 0x0001005f
Jul 31 10:41:20 pve-01 kernel: [ 11.668836] bnx2x: [bnx2x_mc_assert:750(eno1)]Chip Revision: everest3, FW Version: 7_13_1
Jul 31 10:41:20 pve-01 kernel: [ 11.668894] bnx2x: [bnx2x_attn_int_deasserted3:4344(eno1)]driver assert
Jul 31 10:41:20 pve-01 kernel: [ 11.668943] bnx2x: [bnx2x_panic_dump:923(eno1)]begin crash dump -----------------
Jul 31 10:41:20 pve-01 kernel: [ 11.668997] bnx2x: [bnx2x_panic_dump:933(eno1)]def_idx(0x118) def_att_idx(0x4) attn_state(0x1) spq_prod_idx(0x31) next_stats_cnt(0x2)
Jul 31 10:41:20 pve-01 kernel: [ 11.669080] bnx2x: [bnx2x_panic_dump:938(eno1)]DSB: attn bits(0x0) ack(0x1) id(0x0) idx(0x4)
Jul 31 10:41:20 pve-01 kernel: [ 11.669140] bnx2x: [bnx2x_panic_dump:939(eno1)] def (0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x119 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0) igu_sb_id(0x0) igu_seg_id(0x1) pf_id(0x0) vnic_id(0x0) vf_id(0xff) vf_valid (0x0) state(0x1)
Jul 31 10:41:20 pve-01 kernel: [ 11.669285] bnx2x: [bnx2x_panic_dump:990(eno1)]fp0: rx_bd_prod(0x1c6) rx_bd_cons(0x1) rx_comp_prod(0x1d0) rx_comp_cons(0x4) *rx_cons_sb(0x4)
Jul 31 10:41:20 pve-01 kernel: [ 11.669373] bnx2x: [bnx2x_panic_dump:993(eno1)] rx_sge_prod(0x0) last_max_sge(0x0) fp_hc_idx(0x4)
Jul 31 10:41:20 pve-01 kernel: [ 11.669439] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.669524] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.669609] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.669693] bnx2x: [bnx2x_panic_dump:1021(eno1)] run indexes (0x4 0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.669695] bnx2x: [bnx2x_panic_dump:1027(eno1)] indexes (0x0 0x4 0x0 0x0 0x0 0x0 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
Jul 31 10:41:20 pve-01 kernel: [ 11.669881] bnx2x: [bnx2x_panic_dump:990(eno1)]fp1: rx_bd_prod(0x1c5) rx_bd_cons(0x0) rx_comp_prod(0x1cf) rx_comp_cons(0x3) *rx_cons_sb(0x3)
Jul 31 10:41:20 pve-01 kernel: [ 11.669968] bnx2x: [bnx2x_panic_dump:993(eno1)] rx_sge_prod(0x0) last_max_sge(0x0) fp_hc_idx(0x3)
Jul 31 10:41:20 pve-01 kernel: [ 11.670033] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp1: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.670118] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp1: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.670202] bnx2x: [bnx2x_panic_dump:1010(eno1)]fp1: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.670252] bnx2x: [bnx2x_set_vlan_one:8498(eno1)]Set VLAN failed
Jul 31 10:41:20 pve-01 kernel: [ 11.670312] bnx2x: [bnx2x_panic_dump:1021(eno1)] run indexes (0x3 0x0)
Jul 31 10:41:20 pve-01 kernel: [ 11.670315] bnx2x: [bnx2x_panic_dump:1027(eno1)] indexes (
Jul 31 10:41:20 pve-01 kernel: [ 11.670340] bnx2x: [bnx2x_vlan_configure_vid_list:13042(eno1)]Unable to config VLAN 272
Jul 31 10:41:20 pve-01 kernel: [ 11.737808] bnx2x 0000:03:00.0 eno1: bc 7.13.75
Jul 31 10:41:20 pve-01 kernel: [ 11.746868] bnx2x: [bnx2x_mc_assert:720(eno1)]XSTORM_ASSERT_LIST_INDEX 0x2
Jul 31 10:41:20 pve-01 kernel: [ 11.749783] bnx2x: [bnx2x_mc_assert:736(eno1)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x00000100 0x00020017 0x0001005f
Jul 31 10:41:20 pve-01 kernel: [ 11.752714] bnx2x: [bnx2x_mc_assert:750(eno1)]Chip Revision: everest3, FW Version: 7_13_1
Jul 31 10:41:20 pve-01 kernel: [ 11.755582] bnx2x: [bnx2x_panic_dump:1186(eno1)]end crash dump -----------------
Jul 31 10:41:26 pve-01 kernel: [ 17.522326] NETDEV WATCHDOG: eno1 (bnx2x): transmit queue 5 timed out
Jul 31 10:41:26 pve-01 kernel: [ 17.522366] Modules linked in: ebtable_filter ebtables ip_set ip6table_filter ip6_tables iptable_filter bpfilter 8021q garp mrp softdog nfnetlink_log nfnetlink intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel zfs(PO) aes_x86_64 crypto_simd cryptd glue_helper intel_cstate joydev input_leds zunicode(PO) snd_pcm snd_timer intel_rapl_perf zlua(PO) mgag200 snd soundcore ttm serio_raw pcspkr drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt hpilo ioatdma ipmi_si ipmi_devintf mac_hid ipmi_msghandler acpi_power_meter zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic psmouse usbmouse usbkbd usbhid hid lpc_ich pata_acpi ixgbe bnx2x hpsa xfrm_algo dca scsi_transport_sas libcrc32c
Jul 31 10:41:50 pve-01 kernel: [ 41.522259] bnx2x: [bnx2x_clean_tx_queue:1207(eno1)]timeout waiting for queue[7]: txdata->tx_pkt_prod(1) != txdata->tx_pkt_cons(0)
Jul 31 10:41:50 pve-01 kernel: [ 41.526272] bnx2x: [bnx2x_del_all_macs:8541(eno1)]Failed to delete MACs: -5
Jul 31 10:41:50 pve-01 kernel: [ 41.526331] bnx2x: [bnx2x_chip_cleanup:9361(eno1)]Failed to schedule DEL commands for UC MACs list: -5
Jul 31 10:41:50 pve-01 kernel: [ 41.529289] bnx2x: [bnx2x_chip_cleanup:9371(eno1)]Failed to delete all VLANs
Jul 31 10:41:50 pve-01 kernel: [ 41.550277] bnx2x: [bnx2x_func_stop:9120(eno1)]FUNC_STOP ramrod failed. Running a dry transaction
Jul 31 10:41:51 pve-01 kernel: [ 42.290739] bnx2x 0000:03:00.0 eno1: using MSI-X IRQs: sp 160 fp[0] 162 ... fp[7] 169
Jul 31 10:41:51 pve-01 kernel: [ 42.403841] bnx2x: [bnx2x_nic_load:2760(eno1)]Function start failed!

root@pve-01:/tmp# ethtool -i eno1
driver: bnx2x
version: 1.712.30-0 storm 7.13.1.0
firmware-version: mbi 7.14.79 bc 7.13.75
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

root@pve-01:/tmp# pveversion
pve-manager/6.0-5/f8a710d7 (running kernel: 5.0.18-1-pve)

Any hint?
 
hmm - a quick search shows a few bug-reports for these nics and drivers which are all related to certain offloading-features:
* https://lkml.org/lkml/2019/2/18/763
* https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c05376530
* http://www.vbootstrap.com/esxi-5-5-hosts-randomly-lose-network-connectivity/
* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1715519

Maybe try to disable most offloading features (with ethtool ) on the NICs

Aside from that - make sure that you have the latest firmware-patches installed for your system (this also helps in quite a few situations)

hope this helps!
 
Firmware patches are problem, firmware-linux etc, firmware-bxn2x all conflicts with pve-firmware.
 
I meant the BIOS-updates etc. from your Server's vendor.
For most devices 'pve-firmware' ships the appropriate files (mostly in newer versions) and there is no need to install firmware-bxn2x, or others.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!