[SOLVED] Troubleshooting: 10GbE NIC (Configuration Issues )

krish

New Member
Aug 14, 2021
7
1
1
42
Hi, I am having some trouble getting a newly installed Supermicro AOC-STG-i2T 10-Gigabit NIC up and working . I have tried quite a few solutions suggested on this forum and elsewhere to activate this network card, but none of them have worked. Can somebody please help resolve this issue?

System:
PVE version: proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve); pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
Motherboard - Supermicro X11SCL-F
Hard disk controllers (passthrough to TrueNAS) - AOC-S3008L-L8e (Low profile Gen 3 PCI-E x8)
Network card - AOC-STG-i2T / Rev. 2.01 / PCI-E x8 2.1 (2.5GT/s or 5GT/s)

Here are the things that I have done so far:
  • Installed the latest BIOS, Proxmox (no subscription repositories), Intel driver (ixgbe version 5.18.6 - X540AT2)
  • The network page didn't list newly installed 10G ports, so I updated the network interfaces by adding a line in the etc/network/interfaces. Thanks to discussions on this forum; otherwise, I wouldn't have known that device names can be found on the dmesg | grep ixgbe output.
dmesg | grep ixgbe

[ 1.588462] ixgbe: loading out-of-tree module taints kernel.
[ 1.588462] ixgbe: loading out-of-tree module taints kernel.
[ 1.594080] ixgbe 0000:02:00.0 0000:02:00.0 (uninitialized): ixgbe_check_options: FCoE Offload feature enabled
[ 1.758534] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count = 12, Tx Queue count = 12 XDP Queue count = 0
[ 1.823012] ixgbe 0000:02:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[ 1.830312] ixgbe 0000:02:00.0 eth0: MAC: 3, PHY: 3, PBA No: 030B05-0AC
[ 1.830319] ixgbe 0000:02:00.0: ac:1f:xx:xx:xx:xx
[ 1.830323] ixgbe 0000:02:00.0 eth0: Enabled Features: RxQ: 12 TxQ: 12 FdirHash
[ 1.836884] ixgbe 0000:02:00.0 eth0: Intel(R) 10 Gigabit Network Connection
[ 1.841420] ixgbe 0000:02:00.1 0000:02:00.1 (uninitialized): ixgbe_check_options: FCoE Offload feature enabled
[ 2.004327] ixgbe 0000:02:00.1: Multiqueue Enabled: Rx Queue count = 12, Tx Queue count = 12 XDP Queue count = 0
[ 2.070748] ixgbe 0000:02:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[ 2.078051] ixgbe 0000:02:00.1 eth1: MAC: 3, PHY: 3, PBA No: 030B05-0AC
[ 2.078055] ixgbe 0000:02:00.1: ac:1f:xx:xx:xx:xx
[ 2.078059] ixgbe 0000:02:00.1 eth1: Enabled Features: RxQ: 12 TxQ: 12 FdirHash
[ 2.082055] ixgbe 0000:02:00.1 eth1: Intel(R) 10 Gigabit Network Connection
[ 2.082989] ixgbe 0000:02:00.1 enp2s0f1: renamed from eth1
[ 2.118402] ixgbe 0000:02:00.0 enp2s0f0: renamed from eth0

cat /etc/network/interfaces

auto lo
iface lo inet loopback

iface eno2 inet manual

iface eno1 inet manual

iface enp2s0f0 inet manual

iface enp2s0f1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.147/24
gateway 192.168.1.1
bridge-ports eno2
bridge-stp off
bridge-fd 0

auto vmbr1
iface vmbr1 inet static
address 192.168.1.200/24
bridge-ports enp2s0f1
bridge-stp off
bridge-fd 0

ethtool enp2s0f1
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
No data available
lspci -nnk
02:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
Subsystem: Super Micro Computer Inc AOC-STG-I2T [15d9:0734]
Kernel driver in use: vfio-pci
Kernel modules: ixgbe
02:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
Subsystem: Super Micro Computer Inc AOC-STG-I2T [15d9:0734]
Kernel driver in use: vfio-pci
Kernel modules: ixgbe
03:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
Subsystem: Super Micro Computer Inc AOC-S3008L-L8e [15d9:0808]
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
DeviceName: Intel Ethernet I210 #1
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection [15d9:1533]
Kernel driver in use: igb
Kernel modules: igb
05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
DeviceName: Intel Ethernet I210 #2
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection [15d9:1533]
Kernel driver in use: igb
Kernel modules: igb
  • After creating a new Linux bridge, I got "ifreload error." ifupdown2 has also been installed. Here is the dmesg output
Output
'ifreload -a' failed: exit code 1
vmbr1 : error: vmbr1: bridge port enp2s0f1 does not exist
dpkg -l|grep ifupdown2
ii ifupdown2 3.1.0-1+pmx3 all Network Interface Management tool similar to ifupdown
brctl show
bridge name bridge id STP enabled interfaces
fwbr100i0 8000.6ecf07273011 no fwln100i0
tap100i0
fwbr102i0 8000.ca39a6bf1380 no fwln102i0
tap102i0
vmbr0 8000.ac1f6bfe897d no eno2
fwpr100p0
fwpr102p0
vmbr1 8000.0adce181c14f no
  • Upon further research, I noticed that my 10Gig ethernet adapter is in the same group as the SAS controller. I read somewhere that being in the same IOMMU group could potentially result in conflict, passed this command pcie_acs_override=downstream (with and without multifunction, but it didn't change the grouping). Also, switching to another physical (PCI-E) slot didn't affect the group. I would prefer to use this as virtual port than passing it through to TrueNas.
for a in /sys/kernel/iommu_groups/1; do find $a -type l; done | sort --version-sort
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.2
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.1
/sys/kernel/iommu_groups/1/devices/0000:03:00.0
  • To rule out any potential failure of the adapter, I plugged it into a Windows machine and installed Intel PRO drivers. The device worked perfectly.
  • The other thing that is very confusing is the device name. Not sure why "enp2s0f1" doesn't show up on multiple commands, for example, networkctl (it looks very different)
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback n/a unmanaged
2 eno1 ether n/a unmanaged
3 eno2 ether n/a unmanaged
6 vmbr0 bridge n/a unmanaged
11 tap102i0 ether n/a unmanaged
12 fwbr102i0 bridge n/a unmanaged
13 fwpr102p0 ether n/a unmanaged
14 fwln102i0 ether n/a unmanaged
15 tap100i0 ether n/a unmanaged
16 fwbr100i0 bridge n/a unmanaged
17 fwpr100p0 ether n/a unmanaged
18 fwln100i0 ether n/a unmanaged
19 vmbr1 bridge n/a unmanaged

Please let me know if you need any further information. Thanks a lot for your time and help.
 
Last edited:
You haven't told your machine to start the interface - you need 'auto enp2s0f0' and 'auto enp2s0f1'.

Try 'ip link set enp2s0f0 up' and then run ethtool or mii-info.

Personally, I detest the 'meant-to-be-persistent-but-changes-more-than-anything-else' enxxxx naming. If you want to stop that from happening, and go back to the eth0/eth1/eth2 etc naming, you can add net.ifnames=0 biosdevname=0 to your grub configuration. Note that it will break your existing configuration, so make sure you have console access to the machine to rename eno1/eno2 after it's rebooted.
 
  • Like
Reactions: davemcl and krish
You haven't told your machine to start the interface - you need 'auto enp2s0f0' and 'auto enp2s0f1'.

Try 'ip link set enp2s0f0 up' and then run ethtool or mii-info.

Personally, I detest the 'meant-to-be-persistent-but-changes-more-than-anything-else' enxxxx naming. If you want to stop that from happening, and go back to the eth0/eth1/eth2 etc naming, you can add net.ifnames=0 biosdevname=0 to your grub configuration. Note that it will break your existing configuration, so make sure you have console access to the machine to rename eno1/eno2 after it's rebooted.
Hi @xrobau, thanks for helping out. Unfortunately, it didn't work. Please see the output below and let me know if missed anything:

/etc/network/interfaces
auto lo
iface lo inet loopback

iface eno2 inet manual

iface eno1 inet manual

iface enp2s0f0 inet manual

auto enp2s0f1
iface enp2s0f1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.147/24
gateway 192.168.1.1
bridge-ports eno2
bridge-stp off
bridge-fd 0

auto vmbr1
iface vmbr1 inet static
address 192.168.1.200/24
bridge-ports enp2s0f1
bridge-stp off
bridge-fd 0
ethtool enp2s0f1
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
netlink error: no device matches name (offset 24)
netlink error: No such device
No data available

ip link set enp2s0f1 up
Cannot find device "enp2s0f1"

Here is another output of grep ixgbe after updating network interfaces, changes are highlighted in blue

dmesg | grep ixgbe
[ 1.611584] ixgbe: loading out-of-tree module taints kernel.
[ 1.611585] ixgbe: loading out-of-tree module taints kernel.
[ 1.619173] ixgbe 0000:02:00.0 0000:02:00.0 (uninitialized): ixgbe_check_options: FCoE Offload feature enabled

[ 1.783611] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count = 12, Tx Queue count = 12 XDP Queue count = 0
[ 1.848065] ixgbe 0000:02:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[ 1.855223] ixgbe 0000:02:00.0 eth0: MAC: 3, PHY: 3, PBA No: 030B05-0AC
[ 1.855241] ixgbe 0000:02:00.0: ac:1f:xx:xx:xx:xx
[ 1.855244] ixgbe 0000:02:00.0 eth0: Enabled Features: RxQ: 12 TxQ: 12 FdirHash
[ 1.861584] ixgbe 0000:02:00.0 eth0: Intel(R) 10 Gigabit Network Connection
[ 1.865962] ixgbe 0000:02:00.1 0000:02:00.1 (uninitialized): ixgbe_check_options: FCoE Offload feature enabled
[ 2.031374] ixgbe 0000:02:00.1: Multiqueue Enabled: Rx Queue count = 12, Tx Queue count = 12 XDP Queue count = 0
[ 2.096262] ixgbe 0000:02:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[ 2.103172] ixgbe 0000:02:00.1 eth1: MAC: 3, PHY: 3, PBA No: 030B05-0AC
[ 2.103177] ixgbe 0000:02:00.1: ac:1f:xx:xx:xx:xx
[ 2.103181] ixgbe 0000:02:00.1 eth1: Enabled Features: RxQ: 12 TxQ: 12 FdirHash
[ 2.109308] ixgbe 0000:02:00.1 eth1: Intel(R) 10 Gigabit Network Connection
[ 2.144410] ixgbe 0000:02:00.0 enp2s0f0: renamed from eth0
[ 2.187578] ixgbe 0000:02:00.1 enp2s0f1: renamed from eth1
[ 6.128983] ixgbe 0000:02:00.1: registered PHC device on enp2s0f1
[ 11.420289] ixgbe 0000:02:00.1: removed PHC on enp2s0f1

Regarding the naming convention, thanks for the tip. In the current state, how do I map the enp2s0f1 to devices listed on the networkctl?
 
Did you check from the truenas side if the card shows up there? Suspect the network card is caught up with the passthrough.
 
  • Like
Reactions: krish
Did you check from the truenas side if the card shows up there? Suspect the network card is caught up with the passthrough.
Yes, it does, if I pass it through as pcie device. However, even after updating the ip configuration, there was no up link. Hence reverted it to original state and started setting up on the pve.

I do suspect it's the issue with the passthrough as it is in the same group as the sas controller. As you can see above, the command to break the device didn't work. How do I isolate the 10G adapter into its own group?
 
Subsystem: Super Micro Computer Inc AOC-STG-I2T [15d9:0734]
Kernel driver in use: vfio-pci
The card is passhrouh a vm (vfio-pci), so detached from the host.
It could be iommu group sharing indeed, if you only passthrough a sas controller.

Maybe try to not start vm with pci passthrough at boot to be sure.
 
  • Like
Reactions: xrobau and krish
Thanks all for your support. The issue was clearly due to the IOMMU group. I got excited for a moment when I saw both 10G ports listed on the network page, only to realize that TrueNas was already down. There simply isn't any way to resolve this conflict (without the processor supporting the ACS feature), not even by changing the physical slots. However, the good news is that TrueNas can access the 10G ports (as a direct passthrough). My only gripe is that the second port of the 10G adapter has no utility, which I was initially planning to dedicate to another VM (Windows).
 
  • Like
Reactions: Guillermo Dewey
Thanks all for your support. The issue was clearly due to the IOMMU group. I got excited for a moment when I saw both 10G ports listed on the network page, only to realize that TrueNas was already down. There simply isn't any way to resolve this conflict (without the processor supporting the ACS feature), not even by changing the physical slots. However, the good news is that TrueNas can access the 10G ports (as a direct passthrough). My only gripe is that the second port of the 10G adapter has no utility, which I was initially planning to dedicate to another VM (Windows).
I am about the purchase a super micro blade server with this nic card (AOC-ATG-I2T2SM-O). At the end its working and appearing in your network list in proxmox server? just want to make sure before I send the PO to supplier.

thanks !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!