[SOLVED] [Proxmox 8] [Kernel 6.2.16-4-pve]: ixgbe driver fails to load due to PCI device probing failure

manseb

New Member
Jul 25, 2023
2
0
1
Hello,
I have a HP Proliant ML30 Gen9 server and I have installed an Intel X520-DA2 NIC. With the default Proxmox 8 kernel, 6.2.16-4-pve, the ixgbe driver fails to load with the following errors:

[ 1.709221] ixgbe 0000:06:00.0: enabling device (0140 -> 0142)
[ 1.709498] ixgbe 0000:06:00.0: BAR 0: can't reserve [mem 0x80100000-0x8017ffff 64bit]
[ 1.709500] ixgbe 0000:06:00.0: pci_request_selected_regions failed 0xfffffff0
[ 1.709895] ixgbe: probe of 0000:06:00.0 failed with error -16

[ 1.710247] ixgbe 0000:06:00.1: enabling device (0140 -> 0142)
[ 1.710306] ixgbe 0000:06:00.1: BAR 0: can't reserve [mem 0x80180000-0x801fffff 64bit]
[ 1.710308] ixgbe 0000:06:00.1: pci_request_selected_regions failed 0xfffffff0
[ 1.710384] ixgbe: probe of 0000:06:00.1 failed with error -16

I have tried many combinations for the pci kernel parameters but it didn't help.
As a workaround I can remove the PCIe bridge where the NIC is installed and then rescan the PCI bus but I intend to make this NIC my primary one and this WA is cumbersome.

echo 1 > /sys/bus/pci/devices/0000\:00\:1c.4/remove
echo 1 > /sys/bus/pci/rescan

Currently, I have downgraded the Linux kernel to 6.1.10 and it works just fine. Same with kernel 5.15.108 on Proxmox 7.
Attached are the dumps from lspci and PCI related messages from dmesg for both kernel 6.1.10 and 6.2.16-4.

Thank you.
 

Attachments

  • proxmox8_ixgbe_pcie_failure.txt
    42.1 KB · Views: 5
I'm also having these same issues. removing the bridge and rescanning doesn't work for me. Downgrading the kernel did though; Thanks for that!

For future troubleshooting:

Code:
root@pve01:~/drivers# lshw -class network
  *-network UNCLAIMED
       description: Ethernet controller
       product: 82599ES 10-Gigabit SFI/SFP+ Network Connection
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 01
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd cap_list
       configuration: latency=0
       resources: memory:e0000000-e007ffff ioport:d000(size=32) memory:e0100000-e0103fff memory:e0080000-e00fffff memory:e0104000-e0203fff memory:e0204000-e0303fff

Code:
root@pve01:~# uname -a
Linux pve01 6.2.16-5-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z) x86_64 GNU/Linux

Code:
root@pve01:~# lspci -knn | grep Net -A3
00:19.0 Ethernet controller [0200]: Intel Corporation 82579V Gigabit Network Connection [8086:1503] (rev 05)
        DeviceName:  Onboard LAN
        Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:849c]
        Kernel driver in use: e1000e
--
02:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter X520-1 [8086:0006]
        Kernel modules: ixgbe
03:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc 2450 NVMe SSD (DRAM-less) [1344:5411] (rev 01)

Code:
root@pve01:~# dmesg | grep ixgbe
[    1.745595] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[    1.745602] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[    1.745667] ixgbe 0000:02:00.0: enabling device (0000 -> 0002)
[    1.745726] ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0xe0000000-0xe007ffff 64bit]
[    1.745734] ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0
[    1.745764] ixgbe: probe of 0000:02:00.0 failed with error -16

Code:
root@pve01:~# lsmod | grep ixgb
ixgbe                 475136  0
xfrm_algo              20480  1 ixgbe
dca                    20480  1 ixgbe
mdio                   16384  1 ixgbe

Code:
root@pve01:~# modinfo ixgbe
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
license:        GPL v2
description:    Intel(R) 10 Gigabit PCI Express Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     F845394A5EF37F466D5DF16
alias:          pci:v00008086d000015E5sv*sd*bc*sc*i*
alias:          pci:v00008086d000015E4sv*sd*bc*sc*i*
alias:          pci:v00008086d000015CEsv*sd*bc*sc*i*
alias:          pci:v00008086d000015C8sv*sd*bc*sc*i*
alias:          pci:v00008086d000015C7sv*sd*bc*sc*i*
alias:          pci:v00008086d000015C6sv*sd*bc*sc*i*
alias:          pci:v00008086d000015C4sv*sd*bc*sc*i*
alias:          pci:v00008086d000015C3sv*sd*bc*sc*i*
alias:          pci:v00008086d000015C2sv*sd*bc*sc*i*
alias:          pci:v00008086d000015AEsv*sd*bc*sc*i*
alias:          pci:v00008086d000015ACsv*sd*bc*sc*i*
alias:          pci:v00008086d000015ADsv*sd*bc*sc*i*
alias:          pci:v00008086d000015ABsv*sd*bc*sc*i*
alias:          pci:v00008086d000015B0sv*sd*bc*sc*i*
alias:          pci:v00008086d000015AAsv*sd*bc*sc*i*
alias:          pci:v00008086d000015D1sv*sd*bc*sc*i*
alias:          pci:v00008086d00001563sv*sd*bc*sc*i*
alias:          pci:v00008086d00001560sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001557sv*sd*bc*sc*i*
alias:          pci:v00008086d00001558sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Fsv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Dsv*sd*bc*sc*i*
alias:          pci:v00008086d00001528sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F8sv*sd*bc*sc*i*
alias:          pci:v00008086d0000151Csv*sd*bc*sc*i*
alias:          pci:v00008086d00001529sv*sd*bc*sc*i*
alias:          pci:v00008086d0000152Asv*sd*bc*sc*i*
alias:          pci:v00008086d000010F9sv*sd*bc*sc*i*
alias:          pci:v00008086d00001514sv*sd*bc*sc*i*
alias:          pci:v00008086d00001507sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FBsv*sd*bc*sc*i*
alias:          pci:v00008086d00001517sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FCsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F7sv*sd*bc*sc*i*
alias:          pci:v00008086d00001508sv*sd*bc*sc*i*
alias:          pci:v00008086d000010DBsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F4sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010ECsv*sd*bc*sc*i*
alias:          pci:v00008086d000010DDsv*sd*bc*sc*i*
alias:          pci:v00008086d0000150Bsv*sd*bc*sc*i*
alias:          pci:v00008086d000010C8sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C7sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C6sv*sd*bc*sc*i*
alias:          pci:v00008086d000010B6sv*sd*bc*sc*i*
depends:        dca,xfrm_algo,mdio
retpoline:      Y
intree:         Y
name:           ixgbe
vermagic:       6.2.16-5-pve SMP preempt mod_unload modversions
parm:           max_vfs:Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63. (Deprecated) (uint)
parm:           allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599-based adapters (bool)
parm:           debug:Debug level (0=none,...,16=all) (int)
 
Maybe just an additional info:

Seems to me like an issue related only to x520 cards.

Because my x550-t2 work without any issues with 6.2.16-5.

@manseb
As you have an X520-DA2 from intel, did you tryed to update that card to the newest firmware?
Called "nvm-update-utulity" and is available from intel directly on their driver page.

Chances are high that you only need a newer firmware, because from what i seen on the open-source ixgbe driver is, that sometimes they fix/add features and if they need additional fixes in the firmware itself, intel releases a newer firmware then.

That took me once forever to find out, because intel provides no changelogs and i wanted to know exactly what changed in the nvm-firmware, so i digged forever, till i found out that the nvm firmware changes belong simply to the ixgbe driver changes from intel (which gets adopted in the open-source ixgbe driver either).

@DemiNe0
Same for you.


Sidenote for everyone else:
If your adapters are onboard modules (on the mainboard directly).
Do not update nvm-firmware!!!
Because you will render your intel x520/x550/x710 nic into a debug mode!!!
This means your card won't work anymore until you flash it back!
If your onboard nic has 2 ports, only one will stuck into a debug mode, while the second will work!
Nvm util will always offer you to make a backup before flashing, do the backup and safe that backup to a good place, because issues might be not visible first!!!

For everyone one else with an intel branded nic, from intel, that's a save update and you don't actually need a Backup, because you can download simply any firmware version from intel.

However, i don't give a guarantee that the x520 will work after an update, just that it might be the case, or the issue.
But it may also be indeed a bug in the 6.2 kernel in regarding to x520 nics.

Cheers :)
 
I also have the same issues with my x520-DA cards running on some supermicro x9 motherboards.



For anyone else that runs into this, you can follow these steps to rollback/install the older known working 6.1.10-1-pve kernel.

First check and see what kernels you have installed by running.
proxmox-boot-tool kernel list

You should see an output similar to this, note in my case this was a fresh install so it only had the 6.2.16 kernels on the system.
Code:
root@PVE-04:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.2.16-3-pve
6.2.16-5-pve


If you don't have the older kernel, you can install it by running
apt install pve-kernel-6.1

Then you can switch the kernel over by running proxmox-boot-tool kernel pin 6.1.10-1-pve and press y and enter to confirm then reboot.

After reboot you can check the current running kernel with `uname -a` and you should see an output like this.
Code:
root@PVE-04:~# uname -a
Linux PVE-04 6.1.10-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.1.10-1 (2023-02-07T13:10Z) x86_64 GNU/Linux
 
Maybe just an additional info:

Seems to me like an issue related only to x520 cards.

Because my x550-t2 work without any issues with 6.2.16-5.

@manseb
As you have an X520-DA2 from intel, did you tryed to update that card to the newest firmware?
Called "nvm-update-utulity" and is available from intel directly on their driver page.

Chances are high that you only need a newer firmware, because from what i seen on the open-source ixgbe driver is, that sometimes they fix/add features and if they need additional fixes in the firmware itself, intel releases a newer firmware then.

That took me once forever to find out, because intel provides no changelogs and i wanted to know exactly what changed in the nvm-firmware, so i digged forever, till i found out that the nvm firmware changes belong simply to the ixgbe driver changes from intel (which gets adopted in the open-source ixgbe driver either).

https://www.intel.com/content/www/u.../500-series-network-adapters-up-to-10gbe.html

nvm-update-utility doesn't work with the x520 cards unfortunately.
 
https://www.intel.com/content/www/u.../500-series-network-adapters-up-to-10gbe.html

nvm-update-utility doesn't work with the x520 cards unfortunately.
Oh that's stupid, im sorry!
I thought all 500 series have updatable firmware:-(

Quote:
Code:
Alright I solved it for me without "pci=realloc=off" or any patches with a Poweredge R920

In BIOS under "Integrated Devices"

"SR-IOV Global Enable" = ENABLE

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1245938

There are a lot of results i found on google, with that exact same issue.

But none of those results have an solution...

Maybe this is interesting:
https://gist.github.com/ixs/dbaac42...malink_comment_id=4639253#gistcomment-4639253
But this isn't a fix, just a stupid script to allow the use of any sfp receivers.

However, sorry, but can't help.
 
Honestly, I think it's just a matter of waiting for Intel to make a set of ixgbe drivers for the 6.2 kernel.... Or maybe we'll luck out with 6.3? :D
 
Honestly, I think it's just a matter of waiting for Intel to make a set of ixgbe drivers for the 6.2 kernel.... Or maybe we'll luck out with 6.3? :D
Can you passthrough the adapter to a vm to test with newer kernel?
Or simply boot an live iso or sth?

Then you could find out already today if an upcoming kernel fixes that issue, instead of waiting :)
 
Code:
[ 1.709221] ixgbe 0000:06:00.0: enabling device (0140 -> 0142)
[ 1.709498] ixgbe 0000:06:00.0: BAR 0: can't reserve [mem 0x80100000-0x8017ffff 64bit]
[ 1.709500] ixgbe 0000:06:00.0: pci_request_selected_regions failed 0xfffffff0
[ 1.709895] ixgbe: probe of 0000:06:00.0 failed with error -16

[ 1.710247] ixgbe 0000:06:00.1: enabling device (0140 -> 0142)
[ 1.710306] ixgbe 0000:06:00.1: BAR 0: can't reserve [mem 0x80180000-0x801fffff 64bit]
[ 1.710308] ixgbe 0000:06:00.1: pci_request_selected_regions failed 0xfffffff0
[ 1.710384] ixgbe: probe of 0000:06:00.1 failed with error -16


Can you check your BIOS?

Press F9 to BIOS (RBSU), and Press CTR+A (Hidden menu will appear at the bottom "Service Options"):
- "Service Options" > "PCI Express 64bit BAR Support" > "Enabled"
 
Last edited:
Good news! For me, at least. :)
Here are the steps I used to make my Intel X520 work under kernel 6.2.x:

1. Check the 64bit BAR support in the BIOS as described by @emunt6 (your system might be different)

2. Download Intel's complete driver pack from here

3. Extract BOOTUTIL64E.EFI from <archive>/APPS/BootUtil/EFI2_x64 and copy it to an USB stick. (I did this since I intended to work in the EFI Shell, as read here)

4. Once in the EFI Shell select the USB stick (fs0 in my case):
> fs0:

5. List all supported devices:
> BOOTUTIL64E.EFI -E

Note:
With X520 you should see 2 NICs. If you have more NICs listed update the commands below to work only on the relevant device(s).

6. Enable flashing on all devices (not sure if it's needed by the next steps but I did it anyway)
> BOOTUTIL64E.EFI -ALL -FLASHENABLE

7. Reboot the system back to EFI shell

8. Enable 64bit BAR addressing on the NICs and reboot afterwards
BOOTUTIL64E.EFI -ALL -64e

9. The ixgbe driver should load under kernel 6.2.x.

Good luck!
 
@manseb, @DemiNe0, @D4M, could you attach a complete dmesg log where this fails? Ideally, open a new report at https://bugzilla.kernel.org, attach them there, and send the URL to bjorn@helgaas.com and linux-pci@vger.kernel.org (make sure the email is plain text, no HTML, etc, because vger rejects stuff like that).

It seems likely that I broke something with this change, which appeared in v6.2: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), and I'd like to fix that. This should not require kernel parameters, BIOS setting changes, firmware updates, driver updates, etc.
 
This thread is marked [SOLVED]. I think that's wrong, but I don't know how to change it. Comment #10 has a workaround, but it doesn't fix the underlying kernel problem.
 
I find it sad, that no one of the affected is (seemingly) interested in / willing to help(ing) a maintainer to troubleshoot this. :(
 
Quick note - in case it helps someone else ? I just rented a new box from OVH yesterday that I'm setting up for a client, an AMD_Epyc based 16-core with 256gb ram on a supermicro board - and the stock install of Proxmox 7 using OVH template was great, but once I did an in-place upgrade to Proxmox8 and got the newer kernel, there was a error logged in dmesg (approx - - ixgbe - probe error -5) and then the interface is absent / not functional and I have no network access, can only get onto the host via remote KVM access. I think from the sound of it I could fix this if I get an older kernel pinned as a workaround. Kind of a hassle but that is life. I sometimes thought Intel did better than this in terms of letting busted driver updates out the door. sigh.

Tim
 
Oh, wow, thank you. That looks a tiny bit fiddley. Do you think - if I just sit on Proxmox7 for a few months and upgrade to Proxmox8 later, get new kernel (not right away) - the issue will ?probably? be sorted in kernel by then? OR otherwise I guess I could maybe do the proxmox 7>8 upgrade, but make sure to pin the kernel at the less-current release as a stopgap workaround, and only do update 'later / months down road' once the dust settles on this topic?

ie, I am not sure I am super keen to do a custom kernel patch>compile here as a workaround.

thank you for the feedback! It is greatly appreciated!

Tim
 
I totally understand not wanting to deal with a custom kernel. If it's practical for you to post the complete dmesg log (booted with "efi=debug" if possible), I should be able to tell whether it's the same issue. You can attach it to the existing bugzilla.

I suggest asking the Proxmox maintainers to backport 070909e56a7d ("x86/pci: Reserve ECAM if BIOS didn't include it in PNP0C02 _CRS"), which has been applied upstream and will appear in v6.8.
 
Hi, I just reinstalled OVH Box with Proxmox-7 template
did upgrade to 8 in-place
then pinned it to Kernel
6.1.10-1-pve
thinking this would let me boot with network / as per this thread
rebooted
no joy, network error, get KVM to get access, capture dmesg to text file
reboot on rescue mode > get access to pull the dmesg text file
see below for dmesg

not sure if this tells you anything helpful or not.

I did not adjust flags on the boot / yet. right now the thing is not bootable / except rescue mode and then chroot captive root environment.

Tim

---paste---
Code:
TRUNCATED - WILL PUT FULL ELSEWHERE

[   10.988962] xhci_hcd 0000:07:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x0000000000000410
[   10.989352] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[   10.989357] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[   10.989629] scsi host8: ahci
[   10.989727] xhci_hcd 0000:07:00.3: xHCI Host Controller
[   10.989732] xhci_hcd 0000:07:00.3: new USB bus registered, assigned bus number 2
[   10.989736] xhci_hcd 0000:07:00.3: Host supports USB 3.0 SuperSpeed
[   10.989783] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.01
[   10.989788] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   10.989791] usb usb1: Product: xHCI Host Controller
[   10.989794] usb usb1: Manufacturer: Linux 6.1.10-1-pve xhci-hcd
[   10.989797] usb usb1: SerialNumber: 0000:07:00.3
[   10.989847] ixgbe 0000:05:00.0: enabling device (0000 -> 0002)
[   12.252513] ixgbe 0000:05:00.0: Adapter removed
...

[   12.257210] ixgbe 0000:05:00.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
[   12.257214] ixgbe 0000:05:00.0:   device [8086:1563] error status/mask=00002000/00000000
[   12.257217] ixgbe 0000:05:00.0:    [13] NonFatalErr           
[   12.257227] pcieport 0000:00:03.1: AER: Corrected error received: 0000:05:00.0
[   12.257398] ixgbe: probe of 0000:05:00.0 failed with error -5
[   12.257489] ixgbe 0000:05:00.1: enabling device (0000 -> 0002)
[   12.257579] scsi host9: ahci
[   12.499764] {2}[Hardware Error]: Hard

[   13.496245] scsi host12: ahci
[   13.496341] pcieport 0000:00:03.1: AER: aer_status: 0x00000000, aer_mask: 0x04500000
[   13.496345] pcieport 0000:00:03.1: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[   13.496344] ixgbe: probe of 0000:05:00.1 failed with error -5
[   13.496348] pcieport 0000:00:03.1: AER: aer_uncor_severity: 0x004e2030
[   13.496352] pci 0000:05:00.0: AER: can't recover (no error_detected callback)
[   13.496355] pci 0000:05:00.1: AER: can't recover (no error_detected callback)
[   13.496423] scsi host13: ahci

Full copy of text in paste Bin > https://pastebin.com/CAiHwV0d
will expire in 1 week from today
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!