Ubuntu 24.04 VM Crashes with RX 7900 XTX Passthrough on Proxmox 9.2.0 (Kernel 6.8.12-9-pve+)

tahnerd

New Member
Jul 12, 2025
1
1
3
Hardware:
MOBO: MZ32-AR0
CPU: AMD EPYC 7C13 Milan
RAM: 512GB
GPUs: 2x R9 7900 XTX
OS: Proxmox VE 9.2.0
Kernel(s): Initially 6.8.12-9-pve (also tried 6.8.4-2-pve and 6.14.0-2-pve)

/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=1002:744c,1002:ab30 disable_vga=1 disable_reset=1 disable_idle_d3=1

/etc/default/grub additions
Code:
amd_iommu=on iommu=pt

/etc/pve/qemu-server/100.conf (Ubuntu 24.04.02 VM config file)
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 60
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=0,size=4M
hostpci0: 0000:03:00,pcie=1,rombar=0
hostpci1: 0000:c3:00,pcie=1,rombar=0
ide2: local:iso/ubuntu-24.04.2-live-server-amd64.iso,media=cdrom
machine: q35
memory: 65536
net0: virtio=BC:24:11:XX:XX:XX,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,iothread=1,size=500G
scsihw: virtio-scsi-single
sockets: 1

ROCm Details:
Version 6.3.2
Non-DKMS
MOK Enrolled
+firmware-amd-graphics
Issues/Description:
I set up Proxmox on my server, and started working on setting up an Ubuntu 24.04.02 VM, with an end goal of hosting ollama with GPU passthrough. During initial setup of Ubuntu VM, it worked great, was quite stable and initially had little issues. After adding the GPUs for passthrough, VM worked fine initially until I rebooted it. Upon trying to reboot the VM from console with 'qm start 100', I was presented with this error:
Code:
error writing '1' to '/sys/bus/pci/devices/0000:03:00.0/reset': Inappropriate ioctl for device, kvm: ../hw/pci/pci.c:1654: pci_irq_handler: Assertion '0 <= irq_num && irq_num < PCI_NUM_PINS' failed, QEMU exited with code 1

Looking more into this error, I found the following thread:
Link (For a Win11 VM, but same error)

Initially as a fix, they mentioned using the 'vendor-reset' tool, however, my GPUs are incompatible with it after several tests. Users in the thread later confirm the same.

User s2d4 did mentioned they were able to get their setup to work with the '6.14.0-2' kernel, so I tried that. Initially, it seemed to work great. I was able to get the VM booted up, install ROCm, and start installing docker. However, I swapped to an SSH session instead of the in-browser console and when I ran the command 'rocminfo', the VM threw out an 'internal-error' in Proxmox and froze up. After trying several times to start the VM up again, it was incredibly unstable, crashing at random moments, and throwing out 'QEMU exited with code 1' (no PCI_NUMS_PINS line in this case) or VM 'internal-error'. From here, I had found reference to people downgrading to the 6.8.4-2-pve kernel, but they hadn't helped.

Current state:
Ubuntu VM and possibly Proxmox are incredibly unstable. I tried to revert to several snapshots I had made during set up, but the VM either fails to load or near-immediately gives the 'internal-error'. I've reinstalled proxmox three times now, the Ubuntu VM probably more than eight, but still am unable to get AMD GPU passthrough to work in a stable condition. If this is a futile effort/AMD GPU passthrough is considered unstable, or if anyone can point me in the right direction for fixing the problem, I'd be absolutely grateful. Maybe this setup isn't the best idea for LLM hosting either.

I had also read LXCs were even more unstable for GPU passthroughs, but if anyone has any critiques on the current setup or better methods for hosting, please let me know. This is my first time messing with hypervisors and VMs as opposed to bare metal setups, so any guidance is appreciated.

Edit:
After reinstalling, I'm still running into the same issues, although, the VM and proxmox are fairly stable as long as the GPUs aren't passed through. I tried the tips mentioned here:
Link
But it hadn't worked with the Ubuntu VM, following the guidelines they gave. I'm looking at more methods now and will be trying other OSes to see if the issue is specific to Ubuntu or not.
 
Last edited:
  • Like
Reactions: uzumo