GPU pass through on HPE Cray XD670

Ignauus · Dec 19, 2024

Hi,

Anyone that has tried and enabled GPU pass through on this beast of a server?
It has 8 x Nvida H200 with Intel XEON 8570

I've tried various settings / tutorials on the web and this forum but none ends well.

The closest I've got is a driver install fail with errors like belov in the cuda installer log:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid

With CPU type host the VM hangs with "NO VNC" in the console and on the host I see it get stuck on:
vfio-pci-pci 0000:0a:00.0:Enabling HDA controller

I use an UEFI VM with q35, tried all the settings I can think of when adding a PCI Device, IOMMU groups looks fine

We had problems like this on an older HPE server with 8 x A100 also and ended up with using libvirt kvm. I'm not quite ready to give up just yet.
Any suggestion where to begin?

--Peter

Search

Search

GPU pass through on HPE Cray XD670

Ignauus

New Member