Cannot Passthrough H100 PCIE GPU to VMs

JasonFeng

New Member
Jul 26, 2023
17
0
1
I'm new to the Proxmox Forum,but i already use PVE for more than three years.I'm sure that I've read the basic docs like
  1. https://pve.proxmox.com/wiki/PCI_Passthrough#Enable_the_IOMMU
  2. https://pve.proxmox.com/wiki/PCI_Passthrough#Required_Modules
  3. https://pve.proxmox.com/wiki/PCI_Passthrough#IOMMU_Interrupt_Remapping
  4. https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_Isolation
  5. I'm working on a beast of a server with:
    • 144 x Intel(R) Xeon(R) Platinum 8452Y (2 Sockets)
    • 256G ram
    • H100 PCIe GPU card
My goal is passthrough the H100 GPU to VMs and install drivers well
some outputs below
btw,when installing nvidia drivers on Windows VM,host machine may freeze and crash then reboot
1707211858823.png1707211873242.png
1707211959955.png1707211987001.png1707212008300.png
 
USB controller could be passthrough into VMs and it works well,It may means the IOMMU works
 
Hi, we are having similar issues with Linux, the NVIDIA driver is not loading but nvidia-smi can see the H100 inside the VM
We are getting the error:

[ 12.787166] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:2331)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 535.154.05 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.

Despite this the same driver works baremetal! So its the correct driver.

@JasonFeng, which bios version do you have? we have 96.00.30.00.01 which seems to be at least 1 year old by now.
An other question, why do you load nvidia and nvidia-uvm modules inside the VM-Host when you install the Driver in Windows?
 
one thing that you could try @JasonFeng (if you have the time) is to install an ubuntu/redhat baremetal and use their virtualization stack (libvirt/qemu/etc.) to passthrough. with that we could find out if the virtualization stack is the problem or the card/driver/etc.
 
one thing that you could try @JasonFeng (if you have the time) is to install an ubuntu/redhat baremetal and use their virtualization stack (libvirt/qemu/etc.) to passthrough. with that we could find out if the virtualization stack is the problem or the card/driver/etc.
I've tried install ubuntu22.04 desktop on baremetal and it works well with no virtualization options on it ,so i changed to PVE after testing,i have no idea now what i need to test next.Good news is that all GPUs works well under LXC ,but what i need is a gnome desktop environment which is difficult to reach in LXC
 
i meant testing it with ubuntu baremetal in a qemu/kvm vm (so use libvirt/etc. in ubuntu) if that works it's a proxmox problem, if it does not it's either a general qemu/kvm problem or an nvidia driver + virtualization problem
 
So thought I'd post this just in case anyone else has an issue with H100 PCIE gpu's and passthrough to a vm. The real issue here is the version of vbios on the card if it is 96.00.30.00.01 it has to be updated it is an issue with that particular version and the VFIO modules,. The other thing I found that seems to be needed is to add both ids (so add the subsystem one as well) in the vfio.conf (or whatever you've named it) file in /etc/modprobe.d/ so mine is

options vfio-pci ids=10de:2331,10de:1626 disable_vga=1

my /etc/default/grub cmndline is (my hosts have AMD EPYC cpu's)
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt initcall_blacklist=sysfb_init"

So for completeness I have a ubuntu 22.04 vm running the 535 nvidia driver version in the vm i didn't use resource mapping to attach the card but used "qm set 118 hostpci0 a1:00.0,pcie=on,x-vga=off" (obviously that is my specific vmid and pci address) after I created the vm and before booting and installing the os.

Obviously this is what worked for me, but the important part is the vbios update, when I attempted the passthrough before updating it my hosts were unstable even having a hard reset when trying anything in the vm.
 
  • Like
Reactions: gillingham
Thanks for this solution!
Where to get the firmware? I can't find anything.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!