Cannot Passthrough H100 PCIE GPU to VMs

JasonFeng

New Member
Jul 26, 2023
18
0
1
I'm new to the Proxmox Forum,but i already use PVE for more than three years.I'm sure that I've read the basic docs like
  1. https://pve.proxmox.com/wiki/PCI_Passthrough#Enable_the_IOMMU
  2. https://pve.proxmox.com/wiki/PCI_Passthrough#Required_Modules
  3. https://pve.proxmox.com/wiki/PCI_Passthrough#IOMMU_Interrupt_Remapping
  4. https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_Isolation
  5. I'm working on a beast of a server with:
    • 144 x Intel(R) Xeon(R) Platinum 8452Y (2 Sockets)
    • 256G ram
    • H100 PCIe GPU card
My goal is passthrough the H100 GPU to VMs and install drivers well
some outputs below
btw,when installing nvidia drivers on Windows VM,host machine may freeze and crash then reboot
1707211858823.png1707211873242.png
1707211959955.png1707211987001.png1707212008300.png
 
Hi, we are having similar issues with Linux, the NVIDIA driver is not loading but nvidia-smi can see the H100 inside the VM
We are getting the error:

[ 12.787166] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:2331)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 535.154.05 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.

Despite this the same driver works baremetal! So its the correct driver.

@JasonFeng, which bios version do you have? we have 96.00.30.00.01 which seems to be at least 1 year old by now.
An other question, why do you load nvidia and nvidia-uvm modules inside the VM-Host when you install the Driver in Windows?
 
one thing that you could try @JasonFeng (if you have the time) is to install an ubuntu/redhat baremetal and use their virtualization stack (libvirt/qemu/etc.) to passthrough. with that we could find out if the virtualization stack is the problem or the card/driver/etc.
 
one thing that you could try @JasonFeng (if you have the time) is to install an ubuntu/redhat baremetal and use their virtualization stack (libvirt/qemu/etc.) to passthrough. with that we could find out if the virtualization stack is the problem or the card/driver/etc.
I've tried install ubuntu22.04 desktop on baremetal and it works well with no virtualization options on it ,so i changed to PVE after testing,i have no idea now what i need to test next.Good news is that all GPUs works well under LXC ,but what i need is a gnome desktop environment which is difficult to reach in LXC
 
i meant testing it with ubuntu baremetal in a qemu/kvm vm (so use libvirt/etc. in ubuntu) if that works it's a proxmox problem, if it does not it's either a general qemu/kvm problem or an nvidia driver + virtualization problem
 
So thought I'd post this just in case anyone else has an issue with H100 PCIE gpu's and passthrough to a vm. The real issue here is the version of vbios on the card if it is 96.00.30.00.01 it has to be updated it is an issue with that particular version and the VFIO modules,. The other thing I found that seems to be needed is to add both ids (so add the subsystem one as well) in the vfio.conf (or whatever you've named it) file in /etc/modprobe.d/ so mine is

options vfio-pci ids=10de:2331,10de:1626 disable_vga=1

my /etc/default/grub cmndline is (my hosts have AMD EPYC cpu's)
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt initcall_blacklist=sysfb_init"

So for completeness I have a ubuntu 22.04 vm running the 535 nvidia driver version in the vm i didn't use resource mapping to attach the card but used "qm set 118 hostpci0 a1:00.0,pcie=on,x-vga=off" (obviously that is my specific vmid and pci address) after I created the vm and before booting and installing the os.

Obviously this is what worked for me, but the important part is the vbios update, when I attempted the passthrough before updating it my hosts were unstable even having a hard reset when trying anything in the vm.
 
  • Like
Reactions: gillingham
Using Proxmox 8.3.0.. I'd like to join in on the discussion. First, it would be great to get some info on the firmware update and especially where to find it.

Next, in my case I have two identical H100 GPUs:

Bash:
lspci -n -s ca:00
ca:00.0 0302: 10de:2321 (rev a1)


lspci -n -s e1:00
e1:00.0 0302: 10de:2321 (rev a1)

My hope is that the identical product ID is not an issue or is it?

I cannot find any other subsystem. I am used to GTX and RTX, where you always have to add the audio subsystem but here there is none (probably due to the absence of any HDMI or similar output that would require it).

I also tried adding one as a primary GPU. Also nvidia-smi tells me that it cannot see anything.
 
Where to get the firmware? I can't find anything.
First, it would be great to get some info on the firmware update and especially where to find it.
we don't have such card here, but I'd recommend contacting NVIDIA or the vendor where you sourced the GPU for the firmware update
 
Using Proxmox 8.3.0.. I'd like to join in on the discussion. First, it would be great to get some info on the firmware update and especially where to find it.

Next, in my case I have two identical H100 GPUs:

Bash:
lspci -n -s ca:00
ca:00.0 0302: 10de:2321 (rev a1)


lspci -n -s e1:00
e1:00.0 0302: 10de:2321 (rev a1)

My hope is that the identical product ID is not an issue or is it?

I cannot find any other subsystem. I am used to GTX and RTX, where you always have to add the audio subsystem but here there is none (probably due to the absence of any HDMI or similar output that would require it).

I also tried adding one as a primary GPU. Also nvidia-smi tells me that it cannot see anything.
For where to get the firmware, depends on where you purchased your cards. My kit is all Lenovo server hardware so I got my firmware through them (I have seen H100 firmware on the HP and Dell websites). I would go to your hardware manufacturer/supplier. You could also register the cards with Nvidia and see what they have for download. Just so its clear the issue was with that version of the firmware and it didn't stop the passthrough but actually caused hard resets on the host when I booted the VM with the card attached to the vm, this was a very specific issue with that version of the firmware. It should also be noted that my boxes are all AMD EPYC based so intel based hardware might require slightly different GRUB_CMDLINE_LINUX_DEFAULT options.

I do have a box with 2 x H100's both passed through to the same vm, I also tested passing them to two different vm's and had no issue's, not sure what would be needed if you wanted to hand one to the host and one to a vm. The identical product id isn't an issue because its the PCI address that is important. There are no audio subsystems on H100's as they are specifically for GPGPU tasks.

Not sure what you mean by setting one as primary? where were are you trying to run the nvidia-smi from host or vm?
 
For where to get the firmware, depends on where you purchased your cards. My kit is all Lenovo server hardware so I got my firmware through them (I have seen H100 firmware on the HP and Dell websites). I would go to your hardware manufacturer/supplier. You could also register the cards with Nvidia and see what they have for download. Just so its clear the issue was with that version of the firmware and it didn't stop the passthrough but actually caused hard resets on the host when I booted the VM with the card attached to the vm, this was a very specific issue with that version of the firmware. It should also be noted that my boxes are all AMD EPYC based so intel based hardware might require slightly different GRUB_CMDLINE_LINUX_DEFAULT options.

I do have a box with 2 x H100's both passed through to the same vm, I also tested passing them to two different vm's and had no issue's, not sure what would be needed if you wanted to hand one to the host and one to a vm. The identical product id isn't an issue because its the PCI address that is important. There are no audio subsystems on H100's as they are specifically for GPGPU tasks.

Not sure what you mean by setting one as primary? where were are you trying to run the nvidia-smi from host or vm?
Thanks for the feedback. I will get in touch with Dell (in my case they are the supplier). The server is a Dell PowerEdge R760xa. I hope you don't mind helping me out with the rest.

In terms of audio - indeed, I was also unable to find any audio controller related to neither H100 GPUs.

As for your first question regarding the primary GPU - in the PCI device configuration window in Proxmox you can designate a PCI device as a primary GPU for that system. Is this required or can all GPUs that are attached via a passthrough be non-primary, probably meaning no hardware 3D acceleration will be available inside the VM?

The nvidia-smi command was run inside the VM, since this where the drivers are installed.

Here are the essentials in regards to what I've done so far:

HOST
Updating the GRUB was followed by update-grub. Updating the kernel modules (modprobe) was followed by update-initframs -u.
  • /etc/default/grub
    Code:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pci=realloc=on pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"
  • /etc/modprobe.d/blacklist.conf
  • Code:
    blacklist radeon
    blacklist nouveau
    blacklist nvidia
  • /etc/modprobe.d/iommu_unsafe_interrupts.conf
    Code:
    options vfio_iommu_type1 allow_unsafe_interrupts=1
  • /etc/modprobe.d/pve-blacklist.conf
    Code:
    blacklist nvidiafb
  • /etc/modprobe.d/vfio.conf
  • Code:
    options vfio-pci ids=10de:2321,10de:2321 disable_vga=1 (I also tried with a single ID since both are identical)
    softdep nouveau pre: vfio-pci
    softdep nvidia pre: vfio-pci
    softdep nvidiafb pre: vfio-pci
    softdep nvidia_drm pre: vfio-pci
    softdep drm pre: vfio-pci
I tried with and without All functions, ROM Bar and PCI-Express in the Proxmox web UI for each GPU. My current config looks like this (this is only one of the 2 GPUs but the other one is the same except for the difference in PCI adress:

1744016703412.png

VM (Q35 with UEFI, secure boot disabled)
  • /etc/modprobe.d/blacklist-nouveau.conf
    blacklist nouveau
    options nouveau modeset=0
  • cuda-drivers (in my case 570) - here I followed https://docs.nvidia.com/datacenter/...-guide/index.html#ubuntu-installation-network since my VM is an Ubuntu Server 24.04 LTS. During the installation of the drivers I can see that APT is retrieving the packages from Nvidia's repository plus (from my understanding of the documentation I've linked above) the cuda-drivers will install the proprietary drivers.

On the VM with the ROMbar disabled I get (using dmesg)

Bash:
[   11.024558] nvidia: probe of 0000:01:00.0 failed with error -1
[   11.024772] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0)
[   11.024783] nvidia: probe of 0000:02:00.0 failed with error -1
[   11.024849] NVRM: The NVIDIA probe routine failed for 2 device(s).
[   11.024854] NVRM: None of the NVIDIA devices were initialized.
[   11.047781] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
[   15.978701] workqueue: page_reporting_process hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[   24.170695] workqueue: page_reporting_process hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[   24.252123] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[   24.252135] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)

which is currently what I am investigating since it is the only extensive error I could get my hands on.

Nvidia Settings (I have an XFCE with xRDP setup) doesn't show anything that indicates a GPU has been recognised although I don't really take this as a concrete evidence, since it might be just the remote session that's at fault here (not enough KnowHow).

nvidia-smi clearly states
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

If anyone needs more info, please let me know.
 
Last edited:
Interesting. With SeaBIOS and non-Q35 driver 570 with CUDA runtime 12.8 and toolkit install without any issue. I also disabled the ROM-bar.
1744022181278.png