Hot-Plug NVME not working

Domino

Active Member
May 17, 2020
32
8
28
56
Proxmox VE 6.2

I slot a drive into the nvme backplane and heres dmesg output:

Code:
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: [1c58:0023] type 00 class 0x010802
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: reg 0x20: [mem 0x00000000-0x0000ffff 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: Max Payload Size set to 256 (was 128, max 256)
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: enabling Extended Tags
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: 7.876 Gb/s available PCIe bandwidth, limited by 8 GT/s x1 link at 0000:85:10.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: Adding to iommu group 81
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: bridge window [mem 0x00100000-0x001fffff] to [bus 89] add_size 300000 add_align 100000
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: no space for [mem size 0x00400000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: failed to assign [mem size 0x00400000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: no space for [mem size 0x00100000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: failed to assign [mem size 0x00100000]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 6: no space for [mem size 0x00020000 pref]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 4: no space for [mem size 0x00010000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 4: failed to assign [mem size 0x00010000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 0: no space for [mem size 0x00004000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: PCI bridge to [bus 89]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0:   bridge window [io  0xc000-0xcfff]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0:   bridge window [mem 0x3a000400000-0x3a0005fffff 64bit pref]
Jun 04 13:57:01 arcadia kernel: PCI: No. 2 try to assign unassigned res
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: bridge window [mem 0x00100000-0x001fffff] to [bus 89] add_size 300000 add_align 100000
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: no space for [mem size 0x00400000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: failed to assign [mem size 0x00400000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: no space for [mem size 0x00100000]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: BAR 14: failed to assign [mem size 0x00100000]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 4: no space for [mem size 0x00010000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 4: failed to assign [mem size 0x00010000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 0: no space for [mem size 0x00004000 64bit]
Jun 04 13:57:01 arcadia kernel: pci 0000:89:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0: PCI bridge to [bus 89]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0:   bridge window [io  0xc000-0xcfff]
Jun 04 13:57:01 arcadia kernel: pcieport 0000:85:10.0:   bridge window [mem 0x3a000400000-0x3a0005fffff 64bit pref]


The PLX PCIE bridge controller:

Code:
84:00.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:08.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:09.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:0a.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:10.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:11.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:12.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)
85:13.0 PCI bridge: PLX Technology, Inc. Device 8749 (rev ca)


The NVME drive does appear in the PCI root list too once plugged in:

Code:
89:00.0 Non-Volatile memory controller: HGST, Inc. Ultrastar SN200 Series NVMe SSD (rev 02)



Unfortunately the drive doesn't show up in the devices lists, ie expecting something like '/dev/nvme0n1' whence the drive simply does not exist as far as the system is concerned, even though it is in the pcie list. Don't really know what else to do here.

Looking around via Google, it appears similar errors have occurred in the past with other kernels of numerous distros, and the patches are released to fix said errors. I hope this bug is one of those curable ones too?
 
Last edited:
Hi,

PCIe hotplug is not on all HW correctly implemented.
Make sure that you have installed the latest versions of all firmware that your hardware vendor has published.
What Hardware do you use?
 
I see in the log that the hotplug button event is captured, but the kernel seems to have an issue assigning addresses, I will try this with a different brand drive too to see if the issue is not drive specific.

In the log we can see the button event being captured and triggering the assignment process:
Code:
[28866.254339] pcieport 0000:85:12.0: pciehp: Slot(0-4): Attention button pressed
[28866.254344] pcieport 0000:85:12.0: pciehp: Slot(0-4) Powering on due to button press
[28866.254353] pcieport 0000:85:12.0: pciehp: Slot(0-4): Card present
[28867.079504] pci 0000:8b:00.0: [1c58:0023] type 00 class 0x010802
[28867.080436] pci 0000:8b:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[28867.080647] pci 0000:8b:00.0: reg 0x20: [mem 0x00000000-0x0000ffff 64bit]
[28867.080732] pci 0000:8b:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[28867.080788] pci 0000:8b:00.0: Max Payload Size set to 256 (was 128, max 256)
[28867.080809] pci 0000:8b:00.0: enabling Extended Tags
[28867.082160] pci 0000:8b:00.0: Adding to iommu group 81
[28867.091479] pcieport 0000:85:12.0: bridge window [mem 0x00100000-0x001fffff] to [bus 8b] add_size 300000 add_align 100000
[28867.091488] pcieport 0000:85:12.0: BAR 14: no space for [mem size 0x00400000]
[28867.091491] pcieport 0000:85:12.0: BAR 14: failed to assign [mem size 0x00400000]
[28867.091494] pcieport 0000:85:12.0: BAR 14: no space for [mem size 0x00100000]
[28867.091496] pcieport 0000:85:12.0: BAR 14: failed to assign [mem size 0x00100000]
[28867.091502] pci 0000:8b:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[28867.091504] pci 0000:8b:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[28867.091507] pci 0000:8b:00.0: BAR 4: no space for [mem size 0x00010000 64bit]
[28867.091509] pci 0000:8b:00.0: BAR 4: failed to assign [mem size 0x00010000 64bit]
[28867.091512] pci 0000:8b:00.0: BAR 0: no space for [mem size 0x00004000 64bit]
[28867.091513] pci 0000:8b:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit]

The server is a hpe-dl380g9, and the pcie-backplane bridge card is hpe's own nvme x16 plx pciexpress card, which is recognised by the kernel and appears to be reporting hotplug events correctly, as the drive does show up in the lspci list, but the problem appears to be the kernel itself not being able to configure the drive for use in the system due to addressing issues. Further research into this shows that sometimes patches are needed for specific hardware which report configuration differently to how the kernel expects or possibly even incorrectly (I see quite a few patches for such enterprise cases in RedHat's kernel), though not entirely sure that is the same issue here, however that is all I can find on the matter of 'BAR' errors.

The pci-bridge card's vendor and device id: [10b5:8749]
 
Last edited:
Also, I'm unable to passthrough the bridge-card itself, unless that is not permitted as it is a PCI switch.

Update:
I've put in a new boot-drive and am in the process of installing RHE8 and oVirt, to see if the same issue manifests in that setup.
 
Last edited:
Just to update, the issue in the end was the hot-plug functionality in Debian, it works fine in RHEL, Debian throws a wobbly. To be frank I didn't like the RHEL+oVirt setup one bit, quite clunky in comparison to Proxmox and its ease of flexibility, plus RHEL doesn't support RDMA CIFS which is a total downer because with PVE+RDMA-CIFS talking to my Windows storage-host I get crazy speeds and I am not giving that up for anything. Drives work fine in PVE, I'll just have to forgo hotplug, I doubt the Debian crowd upstream will do anything about it, see a lot of people requesting compatibility updates and much of it leads to nothing.

If Proxmox had its own kernel driver gurus, I think it could rule the Enterprise Linux virtualisation hemisphere, because depending on upstream and manufacturers is a royal pain in the backside and rather limiting in hardware selection which backfires with lost potential business. Then again tracking down top kernel coders is a nightmare too as they get grabbed by the big players. Not a very optimistic view on things, reality is never rosy. The biggest problem with writing drivers I see is that manufacturers simply don't put specs out, like working in the blind, so even if the coders were there, they'd be fumbling around in the dark, not to mention would need the hardware to work with in the first place.
 
I doubt the Debian crowd upstream will do anything about it, see a lot of people requesting compatibility updates and much of it leads to nothing.

Keep in mind that the kernel is from Ubuntu, not Debian. So maybe throwing up the Ubuntu upstream is better.
 
  • Like
Reactions: Domino
[28867.091479] pcieport 0000:85:12.0: bridge window [mem 0x00100000-0x001fffff] to [bus 8b] add_size 300000 add_align 100000
[28867.091488] pcieport 0000:85:12.0: BAR 14: no space for [mem size 0x00400000]
[28867.091491] pcieport 0000:85:12.0: BAR 14: failed to assign [mem size 0x00400000]
[28867.091494] pcieport 0000:85:12.0: BAR 14: no space for [mem size 0x00100000]
[28867.091496] pcieport 0000:85:12.0: BAR 14: failed to assign [mem size 0x00100000]
[28867.091502] pci 0000:8b:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[28867.091504] pci 0000:8b:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[28867.091507] pci 0000:8b:00.0: BAR 4: no space for [mem size 0x00010000 64bit]
[28867.091509] pci 0000:8b:00.0: BAR 4: failed to assign [mem size 0x00010000 64bit]
[28867.091512] pci 0000:8b:00.0: BAR 0: no space for [mem size 0x00004000 64bit]
[28867.091513] pci 0000:8b:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit][/CODE]

Try adding pci=realloc to your kernel cmdline.
 
Hi all,

We also hit this problem on Proxmox 6.4-6 - we tried to add 4 NVME disks into running system using hot-plug.
One of the ways to solve this issue without reboot/patch/etc on critical running systems -
1) Install "nvme-cli"
2) Do "nvme reset /dev/nvmeX" on each new disk (can be found via "nvme list-subsys")

Update: we observed this only on hypervisor with AMD CPU. The other hypervisor where we hot-plugged NVME disks was on Intel and we did not have to do anything - system recognized disks without any additional steps. So I suppose this problem applies only to AMD-based hardware.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!