[SOLVED] Brocade 1020 on PX4.1 host + VirtIO net on guest == TX Checksum Badness

joshin

Renowned Member
Jul 23, 2013
123
9
83
Phoenix, Arizona
Howdy all,

I recently upgraded my Proxmox 3.4 install to 4.1 and since the upgrade, I can't use the VirtIO network driver on Linux clients. (We don't run Windows, so I can't test.)

The setup is fairly basic, a Quanta LB4M switch with 2x 10G SFP+ ports, two servers each with a Brocade 1020 CNA 10G NIC. We're only using it for networking, not FC or FCoE. This worked perfectly on 3.4.

If the VirtIO network interface is selected on guest, the OS can get a DHCP address, but it can't ping or be pinged, and nothing TCP works. Changing to e1000 allows the guest to access the network, but at a terrible cost of extremely high "Hardware Interrupts" which impacts performance greatly on both host and guest.

The 1020's have been updated to the latest firmware I could find - 3.2.3.2

dmesg has some interesting content which I'll paste below. (Edited to add: Fixed this - there's a bug in the Kernel and a missing firmware file)

I'll also post my interfaces file below. One of 1020 interfaces and one of the onboard interfaces are in an active-passive bond.

Any help or tips would be greatly appreciated.

Dmesg: (Edited to add - fixed this - bug in the kernel and a missing firmware file - TCP not working is still happening)

root@tsproxmox10:/var/lib/vz# dmesg |grep b.a
[ 0.000000] Linux version 4.2.8-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Wed Feb 3 16:33:06 CET 2016 ()
[ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20150619/tbfadt-623)
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000041] Calibrating delay loop (skipped), value calculated using timer frequency.. 4267.18 BogoMIPS (lpj=8534360)
[ 1.241198] bna: QLogic BR-series 10G Ethernet driver - version: 3.2.25.1
[ 1.241766] bna 0000:04:00.2: bar0 mapped to ffffc90006e80000, len 262144
[ 1.700979] bna 0000:04:00.3: bar0 mapped to ffffc90006c00000, len 262144
[ 1.910913] bfa 0000:04:00.0: Running firmware version is incompatible with the driver version
[ 1.952773] usb 4-1: Product: Virtual Keyboard and Mouse
[ 2.003673] tsc: Refined TSC clocksource calibration: 2133.409 MHz
[ 2.040846] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1a.1/usb4/4-1/4-1:1.0/0003:046B:FF10.0001/input/input4
[ 2.096168] hid-generic 0003:046B:FF10.0001: input,hidraw0: USB HID v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1a.1-1/input0
[ 2.096342] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1a.1/usb4/4-1/4-1:1.1/0003:046B:FF10.0002/input/input5
[ 2.096981] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1a.1-1/input1
[ 2.107722] bfa 0000:04:00.0: bfa init failed
[ 2.110629] bfa 0000:04:00.1: Running firmware version is incompatible with the driver version
[ 2.307716] bfa 0000:04:00.1: bfa init failed
[ 10.788984] bna 0000:04:00.3 eth3: link up
[ 240.464261] INFO: task bfad_worker:340 blocked for more than 120 seconds.
[ 240.464409] bfad_worker D ffff880c3f696a00 0 340 2 0x00000000
[ 240.464439] [<ffffffffc00fd390>] ? bfad_read_firmware+0xe0/0xe0 [bfa]
[ 240.464465] INFO: task bfad_worker:344 blocked for more than 120 seconds.
[ 240.464599] bfad_worker D ffff880c3f716a00 0 344 2 0x00000000
[ 240.464615] [<ffffffffc00fd390>] ? bfad_read_firmware+0xe0/0xe0 [bfa]
[ 360.468830] INFO: task bfad_worker:340 blocked for more than 120 seconds.
[ 360.468989] bfad_worker D ffff880c3f696a00 0 340 2 0x00000000
[ 360.469021] [<ffffffffc00fd390>] ? bfad_read_firmware+0xe0/0xe0 [bfa]
[ 360.469046] INFO: task bfad_worker:344 blocked for more than 120 seconds.
[ 360.469235] bfad_worker D ffff880c3f716a00 0 344 2 0x00000000
[ 360.469251] [<ffffffffc00fd390>] ? bfad_read_firmware+0xe0/0xe0 [bfa]

I cut more of the blocking errors - there were 10 in all

Interfaces file:

auto lo
iface lo inet loopback

iface eth0 inet manual
#onboard Intel NIC

iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual
#1020 Interface


auto bond0
iface bond0 inet manual
slaves eth3 eth0
bond_miimon 100
bond_mode active-backup
#slaves eth0 eth3

auto bond0.1
iface bond0.1 inet manual
vlan-raw-device bond0

auto bond0.2
iface bond0.2 inet manual
vlan-raw-device bond0

auto bond0.3
iface bond0.3 inet manual
vlan-raw-device bond0

auto vmbr0
iface vmbr0 inet static
address 10.1.2.67
netmask 255.255.255.0
gateway 10.1.2.9
bridge_ports bond0
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
vlan_raw_device bond0

auto vmbr1
iface vmbr1 inet manual
bridge_ports bond0.1
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
vlan_raw_device bond0.1

auto vmbr2
iface vmbr2 inet manual
bridge_ports bond0.2
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
vlan_raw_device bond0.2

auto vmbr3
iface vmbr3 inet manual
bridge_ports bond0.3
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
vlan_raw_device bond0.3
 
Last edited:
Okay, some progress on the errors in dmesg.

Turns out there's a bug in the kernel where the wrong firmware files are being specified.
https://bugzilla.kernel.org/show_bug.cgi?id=104191

I found and copied over the missing firmware file (cbfw-3.2.5.1.bin), then was got around the driver calling for the wrong firmware files by setting the old 3.2.3.0 ones aside and renaming the 3.2.5.1 ones to them. One update of the initramfs, and the drivers are loaded.

However, I still can't make any TCP connections. DHCP addresses come up fine and so do pings & traceroutes, but I can't ssh or even pull files with wget/curl.

Any ideas?
Thanks!
 
I seem to be having related issues.

I use my brocade adapters primarily for NFS transfers. I can mount the shares, but after that everything fails.

I found and copied over the missing firmware file (cbfw-3.2.5.1.bin), then was got around the driver calling for the wrong firmware files by setting the old 3.2.3.0 ones aside and renaming the 3.2.5.1 ones to them. One update of the initramfs, and the drivers are loaded.

I have the same errors in dmesg. Would you mind sharing where you found the correct firmware?

The funny thing is that I vaguely remember having the same problem on the shipping kernel with the initial release of Ubuntu 14.04LTS. The packaging of the wrong firmware for the bfa module appears to be a recurring issue.

Damn you "tx checksumming" - you ate 2 days of my life.

Turning it off on the guest seems to solve it.

Would you mind sharing how you go about disabling this TX checksumming as well?

Much obliged,
Matt
 
Hey Matt!

Thankfully I documented everything to some degree or another.

Command:
ethtool --offload eth0 tx off

Firmware location: https://driverdownloads.qlogic.com/...aspx?productid=1238&oemid=410&oemcatid=135967

To upgrade the firmware:
ISO downloaded and boot the ISO
Once booted, command should update firmware
bcu boot --update 1 brocade-_adapter_boot_fw_v3-2-3-2 //update with your firmware version
bcu boot –update <card id> file-name


In your kernel's /lib/firmware Files are named and include most/all of:
cbfw
ct2fw
ctfw
ct2f


Latest drivers location: http://driverdownloads.qlogic.com/Q...t.aspx?ProductCategory=322&Product=1214&Os=65

Good luck!
 
Hey Matt!

Thankfully I documented everything to some degree or another.

Command:
ethtool --offload eth0 tx off

Firmware location: https://driverdownloads.qlogic.com/...aspx?productid=1238&oemid=410&oemcatid=135967

To upgrade the firmware:
ISO downloaded and boot the ISO
Once booted, command should update firmware
bcu boot --update 1 brocade-_adapter_boot_fw_v3-2-3-2 //update with your firmware version
bcu boot –update <card id> file-name


In your kernel's /lib/firmware Files are named and include most/all of:
cbfw
ct2fw
ctfw
ct2f


Latest drivers location: http://driverdownloads.qlogic.com/Q...t.aspx?ProductCategory=322&Product=1214&Os=65

Good luck!


I appreciate your help, but I am a little bit confused.

Firstly, from the looks of your dmesg (and mine looks the same) you were having issues with the bnf module. Isn't this for FCoE only? Are you using the FCoE feature of the BR1020, or only the ethernet, like me? Did you actually flash the latest firmware to your BR1020, or did you try to match the firmware version to the driver version in Proxmox?

Secondly, why did you use the firmware from a different adapter (BR1741M-K)? Is there something wrong with the firmware on the BR1020 download page (where you linked driver)?

I'm guessing you used the firmware files from the downloaded drivers to replace and rename the firmware files in /lib/firmware for everything to work? Which drivers did you grab, as there are no debian ones. Did the Redhat ones do the trick?

Right now my iperfs over the interface work, but they are very slow (2-3Gbit/s send, ~1gbit receive). If I try to mount an NFS mount over the interface it works in spurts and then freezes for long periods of time. Previously I had it working over ESXi before I migrated to proxmox, so I don't think it is anything hardware related, but I am also a huge novice when it comes to fiber ethernet. Could a damaged or dirty fiber have similar symptoms?

I really appreciate the help!

Thanks,
Matt
 
Yeah, so I spent several hours on this yesterday.

I fixed the firmware issue like you mentioned above, but in retrospect that was only impacting the bnf module (fiber channel) not the bna module (ethernet) so it didn't have any impact for me.

I tried disabling TX checksumming like you recommended, but it did not help. Something is seriously wrong. I can ping across the interface, and iperf works, but it tops out at under 2Gbit/s which is highly disappointing, and when I try to put any really traffic on it (like NFS, SSH, anything really) it just chokes and stops working.

I even tries switching the cards to different PCIe slots to see if that was the problem, and tried with a different LC-LC cable, in case I damaged the fiber, but still nothing.

I wonder if one of my cards, or one of my transceivers has died. I'm ready to give up on this one. I've wasted enough time on trying to get the brocade adapters to work properly.

If only Intel's copper 10Gig adapters weren't so expensive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!