Unable to login to FCoE fabric on Dell FC640 (BCM57840), PVE 8.1-2 (same on debian & ubuntu, Almalinux works)

VirtAngus · Apr 5, 2024

We have a number of Dell FC640 servers with Broadcom BCM57840NetXtreme II 10G CNICs. It is connected to SAN IBM V5100 NVMe via Cisco Nexus 5672UP. We have tested Proxmox VE 8.1-2, Debian versions from buster to trixie, ubuntu server 22.04. Firmware used is bnx2x-e2-7.13.21 for PVE, debian 11-13 and bnx2x-e2-7.13.1.0 for debian buster.

In no case login to fabric occurred so no LUNs appeared.

With AlmaLinux 9.3 installed to the same server fabric login is done flawlessly. What was needed -- Just fw for CNAs (the same, bnx2x-e2-7.13.21), fcoe-utils package.

What was done (on PVE, Debian, Ubuntu):

- apt install fcoe-utils
- systemctl start fcoe-utils (with or without /etc/fcoe/cfg-ethX editing -- no effect; "bnx2fc fcoe" and "bnx2fc qedf" was tested as SUPPORTED_DRIVERS in /etc/fcoe/config since qedf was used in Alma.)

- when no /etc/fcoe/cfg-<iface> was created:
fipvlan -c -s -d -u <iface>
or
ip link set dev <iface> up && fcoeadm -c <iface>
or
ip link set dev <iface> up && echo <iface> (where iface is
root iface or iface.vlan where vlan was discovered via fipvlan) >
/sys/bus/fcoe/ctlr_create
- when /etc/fcoe/cfg-<iface> was created:
ip link setdev <iface> up && systemctl start fcoe-utils

When debug_logging and debug options was turned on to modules bnx2fc,
libfcoe, fcoe (or qedf), one can see something like following:

[ 575.759399] host18: lport 000000: Entered FLOGI state from FLOGI state
[ 575.759405] host18: xid 889: Exchange timer armed : 2000 msecs
[ 575.759408] host18: fip: sending FLOGI
[ 575.759410] host18: fip: sending FLOGI - reselect
[ 575.759411] host18: fip: consider FCF fab 2003003a9c851a01 VFID 3
mac 00:3a:9c:a0:6e:20 map efc00 val 1 sent 1 pri 128
[ 575.759416] host18: fip: using FCF mac 00:3a:9c:a0:6e:20
[ 575.759418] host18: fip: sending FLOGI - clearing
[ 575.759420] host18: fip: consider FCF fab 2003003a9c851a01 VFID 3
mac 00:3a:9c:a0:6e:20 map efc00 val 1 sent 0 pri 128
[ 575.759424] host18: fip: using FCF mac 00:3a:9c:a0:6e:20
[ 575.759426] host18: fip: FLOGI/FDISC sent with FPMA
[ 577.775399] host18: xid 889: Exchange timed out state 0
[ 577.775403] host18: lport 000000: Received a FLOGI response timeout
[ 577.775405] host18: lport 000000: Error 1 in state FLOGI, retries 1
[ 577.775409] host18: xid 889: exch: abort, time 20000 msecs
[ 577.775411] host18: xid 889: f_ctl 90000 seq 1
[ 577.775414] host18: xid 889: Exchange timer armed : 20000 msecs
[ 579.791400] host18: lport 000000: Entered FLOGI state from FLOGI state
[ 579.791405] host18: xid 909: Exchange timer armed : 2000 msecs
[ 579.791408] host18: fip: sending FLOGI
[ 579.791410] host18: fip: sending FLOGI - reselect

and so on.

What is weird: when network traffic was sniffed with tcpdump, no
one packet with FLOGI was actually sent ... I thought that is may be because of offloading but with Alma FCOE packets can be seen.

In the same time we have no problems with FCoE on HP Gen9 blade server with the same BCM57840 CNAs connected to HPE EVA 6530 via HPE VirtualConnect. Fabric login performed there as simple as loading kernel modules and echo <iface> > /sys/bus/fcoe/ctlr_create. Maybe difference is in VLAN absence in FCoE from VirtualConnect...

I've heard that these CNAs are not compatible/buggy when IOMMU is turned on but iommu_groups directory is empty and the only messages about IOMMU in PVE, Debian, AlmaLinux are "iommu: Default domain type: Translated" and "iommu: DMA TLB invalidation policy: lazy mode"

So, we're totally stuck now.

Thanks in advance for any help!

Search

Search

Unable to login to FCoE fabric on Dell FC640 (BCM57840), PVE 8.1-2 (same on debian & ubuntu, Almalinux works)

VirtAngus

New Member