I am writing this up for several reasons
I have spent about two days researching this problem and still don't know everything about it, but still learning. No matter, from what I can tell, the bad news...it is still an on-going issue, as documented here, here, here, here. The good news, is that there are simple workarounds.
Several people have documented the issue in the Proxmox forums here, here, here, here, here along with several other problems that sent me on wild goose chases. However through those paths I ended up down several rabbit holes and got what I feel is the source of the issue.
So the shortcut here, if you just want to know what the workaround is, you have a few options.
but this didn't seem to be the case for this situation
Where I found the best sources of the issue were the bug report from this kubevirt issue-11093, this kubevirt issue-RFE Harvester/kubevirt and high mem gpus with a 32 bit BAR memory slot, out of those, that led to a blog post about the issue, which also led to a commit in the edk2 repo discussing the context, and an in depth discussion on the edk2 mailing list discussion of the issue that explain in greater detail the source of the problem and why it is a difficult issue to fix. Additional context for BAR sizing from Nvidia and the fix for VMWare as well. NVidia even speaks of the error message for BAR1.
Breaking down the issue and up to my understanding of the issue (correct me if I'm saying something wrong)...
As mentioned in the mailing list on edk2 OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BARallocation that is 32GB in size. In other terms, OVMF exposes a 64-BIT MMIO (Memory Mapped Input/Output) space, of that space 32GB of it is given to PCI MMIO BAR allocation, this 32GB space is seemingly also referred to as an "aperture"(?). There are obviously multiple devices asking for space within the PCI MMIO allocation pool, so if there are multiple devices that add up to >32GB certain devices won't be able to be mapped.
You can look at the BAR requests on a host machine by running
The following output
As you can see an L40S has a BAR 0, BAR 1, and BAR 3 have sizes of 16MB, 64GB, and 32MB respectively. For a other devices like the V100 it would be BAR 0, BAR 1, and BAR 3 having sizes of 16M, 32G, 32MB.
Moral of the story, the Apature size is hardware dependent. I couldn't find a comprehensive list but Nvidia does specify the MMIO Space required for several different versions of their GPUs.
In my case with the L40s it is a 48GB vRAM card, and will require 64GB MMIO space for itself. This means that if I set the MMIO space to 64GB
Implying that, CPUs that have 36-Bit phys address width couldn't support a larger aperture size anyways, but in my case my CPUs have address size of 46 bits physical. You can check your bits physical size with grep 'bits physical' /proc/cpuinfo. One thing I would note is that the default the processor type in Proxmox is x86-64-v2-AES is "40 bits physical", so if you need to pass in 4+ GPUs like the H100 that have 128GB of MMIO, you will run out of memory space the way I understand it.
I will mention that there is something called Resizeable BAR (ReBAR) where you can resize the BAR 2 of GPUs that support it, but I do not know the implications there are for this though. Some reading
So with this, you should be able to solve these BAR issues with your GPUs using passthrough. After reading about it for the past two days, like me several people just stumbled upon an answer, but I wanted to know more, so here I am...
As for the Proxmox project, how this can be handled more cleanly, I don't have a solution for and it seems that OVMF or edk2 project would need to fix this problem. From the sounds of it, it won't be a trivial fix and it is unclear on if it will be an option that people will need to set to support larger MMIO GPUs, ReBAR or if there is some "magic" that can happen to make it work automatically as mentioned by Daniel. Regardless, if its going to be years...maybe a snippet in the documentation as I think this will become more common occurrence as GPUs start coming with larger amounts of vRAM.
Hopefully this helps.
source: https://forums.developer.nvidia.com...iroment-does-not-support-resizable-bar/306268
Find supported CPU physical bits
- To increase my own understanding of the problem (I still haven't wrapped my head around it entirely)
- Not wanting to take the simple answer as the solution
- Documenting the issue for the community so people don't have to spend the time wading through irrelevant information
- For me so when I eventually move on to something else and forget about this, I at least have good source of information to refer back to (I'm stretching the limit on tabs in my browser at this point...).
If you have a GPU with >= 24GB of RAM and you are using OVMF (EUFI) and you use PCI Passthrough to a VM you will be able to see the device with lspci -v, you will be able to install the driver, but the driver will not load and you will see the following error message when it tries to load.
Code:
[ 22.345395] kernel: NVRM: This PCI I/O region assigned to your NVIDIAdevice is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:48:00.0)
[ 22.348443] kernel: NVRM: The system BIOS may have misconfigured your GPU.
- Proxmox 8.4.1 (latest at the time 2024/04/12)
- multiple GPUs (Nvidia V100, L40S, A100, H100)
- The VM hosts I am deploying and testing on are RHEL9.4.
- Using various Intel Xeon 4xxx, 6xxxx CPUs, on Dell (R750) and Supermicro (X13) platforms.
I have spent about two days researching this problem and still don't know everything about it, but still learning. No matter, from what I can tell, the bad news...it is still an on-going issue, as documented here, here, here, here. The good news, is that there are simple workarounds.
Several people have documented the issue in the Proxmox forums here, here, here, here, here along with several other problems that sent me on wild goose chases. However through those paths I ended up down several rabbit holes and got what I feel is the source of the issue.
So the shortcut here, if you just want to know what the workaround is, you have a few options.
- Use SeaBIOS rather than OVMF (EUFI) as mentioned in the above links.
but this didn't seem to be the case for this situation
- Set a larger Aperture on your VM (size depends on the GPU type).
qm set VMID -args '-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536'
- Change your VM Processor type from x86-64-v2-AES to host
qm set VMID -cpu host
Where I found the best sources of the issue were the bug report from this kubevirt issue-11093, this kubevirt issue-RFE Harvester/kubevirt and high mem gpus with a 32 bit BAR memory slot, out of those, that led to a blog post about the issue, which also led to a commit in the edk2 repo discussing the context, and an in depth discussion on the edk2 mailing list discussion of the issue that explain in greater detail the source of the problem and why it is a difficult issue to fix. Additional context for BAR sizing from Nvidia and the fix for VMWare as well. NVidia even speaks of the error message for BAR1.
Breaking down the issue and up to my understanding of the issue (correct me if I'm saying something wrong)...
As mentioned in the mailing list on edk2 OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BARallocation that is 32GB in size. In other terms, OVMF exposes a 64-BIT MMIO (Memory Mapped Input/Output) space, of that space 32GB of it is given to PCI MMIO BAR allocation, this 32GB space is seemingly also referred to as an "aperture"(?). There are obviously multiple devices asking for space within the PCI MMIO allocation pool, so if there are multiple devices that add up to >32GB certain devices won't be able to be mapped.
You can look at the BAR requests on a host machine by running
lspci -vvvs <deviceID>
if you have the drivers working on the host you can find the BAR1 Memory requirements with nvidia-smi -q command as well.The following output
Code:
01:00.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev
[...]
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 64GB, supported: 64GB
BAR 3: current size: 32MB, supported: 32MB
[...]
As you can see an L40S has a BAR 0, BAR 1, and BAR 3 have sizes of 16MB, 64GB, and 32MB respectively. For a other devices like the V100 it would be BAR 0, BAR 1, and BAR 3 having sizes of 16M, 32G, 32MB.
Moral of the story, the Apature size is hardware dependent. I couldn't find a comprehensive list but Nvidia does specify the MMIO Space required for several different versions of their GPUs.
In my case with the L40s it is a 48GB vRAM card, and will require 64GB MMIO space for itself. This means that if I set the MMIO space to 64GB
qm set <VMID> -args '-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
, the total space will still be too small, which I confirmed. Since BARs are powers of two and due to alignment purposes would need to be 131072. I do think the MMIO space (aperture) could be smaller as Laszlo is references physical hardware limitations of the CPUThe largest BAR that can fit in a 48 GB aperture is 32 GB. Thereforesuch an aperture would be aligned at 32 GB -- the lowest base address(dependent on guest RAM size) would be 32 GB. Meaning that the aperturewould end at 32 + 48 = 80 GB. That still breaches the 36-bit physaddress width.
Implying that, CPUs that have 36-Bit phys address width couldn't support a larger aperture size anyways, but in my case my CPUs have address size of 46 bits physical. You can check your bits physical size with grep 'bits physical' /proc/cpuinfo. One thing I would note is that the default the processor type in Proxmox is x86-64-v2-AES is "40 bits physical", so if you need to pass in 4+ GPUs like the H100 that have 128GB of MMIO, you will run out of memory space the way I understand it.
I will mention that there is something called Resizeable BAR (ReBAR) where you can resize the BAR 2 of GPUs that support it, but I do not know the implications there are for this though. Some reading
So with this, you should be able to solve these BAR issues with your GPUs using passthrough. After reading about it for the past two days, like me several people just stumbled upon an answer, but I wanted to know more, so here I am...
As for the Proxmox project, how this can be handled more cleanly, I don't have a solution for and it seems that OVMF or edk2 project would need to fix this problem. From the sounds of it, it won't be a trivial fix and it is unclear on if it will be an option that people will need to set to support larger MMIO GPUs, ReBAR or if there is some "magic" that can happen to make it work automatically as mentioned by Daniel. Regardless, if its going to be years...maybe a snippet in the documentation as I think this will become more common occurrence as GPUs start coming with larger amounts of vRAM.
Hopefully this helps.
Find your devices BAR size
lspci | grep -i nvi
lspci -vvvs 3a:00.0
source: https://forums.developer.nvidia.com...iroment-does-not-support-resizable-bar/306268
Find supported CPU physical bits
grep 'bits physical' /proc/cpuinfo