[SOLVED] AMD GPU 7900XTX passthrough to Ubuntu 22.04 guest vm on proxmox 8.0.2 for PyTorch ROCM.

UltraRabbit

New Member
Nov 6, 2023
1
2
3
I got a new machine with AMD CPU 7950x and GPU 7900XTX setup with Proxmox 8.0.2 a few days ago. After successfully make the dedicated GPU passthrough to a Windows11 guest vm and a Ubuntu 22.4 guest vm, I tried to install the latest AMD ROCM support with Pytorch in the Ubuntu vm.
Although the installation procedure was quite smooth, I got stuck with an error from HIP kernel complaining it can't continue with present state. After searching on Google, I learned that PCIe DevCaps for AtomicOpsCap: Routing+ 32bit+ 64bit+ must be enabled althrough the PCIe tree to the GPU device. However, the default implementation of pci-root-port of kvm didn't enable that and cause the problem.
I found there's a patch on qemu repo already existed for enabling these features. I got it applied and recompile the pve-qemu-kvm: 8.0.2-7 to force enable the DevCaps as required. Now the Pytorch with ROCM could run successfully.
I'm wondering if this patch would be included in the futuer pve-qemu-kvm release.

Best wishes to Proxmox community.