GPU pass through not working

csutter

New Member
Jan 2, 2025
2
0
1
I recently got a 2nd GPU (AMD Mi50) and added it to my system. The device show up in its own IOMMU group but when I add it to a VM the whole system locks up and I have to do a hard power down to get everything to work properly. My other GPU seems to work just fine (nVidia GTX 1650)

here is the output of lspci -nnk
Code:
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] [1002:66af] (rev c1)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] [1002:081e]
        Kernel modules: amdgpu
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650] [10de:2188] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU116 [GeForce GTX 1650] [1462:8d97]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
09:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU116 High Definition Audio Controller [1462:8d97]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
09:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU116 USB 3.1 Host Controller [1462:8d97]
        Kernel driver in use: vfio-pci
        Kernel modules: xhci_pci
09:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU116 USB Type-C UCSI Controller [1462:8d97]
        Kernel driver in use: vfio-pci
        Kernel modules: i2c_nvidia_gpu

here is my vm config file
Code:
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: x86-64-v2-AES
efidisk0: local:103/vm-103-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:08:00,x-vga=1
memory: 8192
meta: creation-qemu=9.0.2,ctime=1739588171
name: AI
net0: virtio=BC:24:11:7D:F7:CB,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local:103/vm-103-disk-1.qcow2,iothread=1,size=200G
scsihw: virtio-scsi-single
smbios1: uuid=fbe7c13f-a6c9-420f-b322-7111e71f474d
sockets: 1
vmgenid: e3bcbfd8-6c5b-44f3-991f-86b4fa12da70
 
Hi,

please post the output of pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist "" and pveversion -v.

csutter said:
Code:
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] [1002:66af] (rev c1)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] [1002:081e]
        Kernel modules: amdgpu
Also, the last line here suggests you did not blacklist the amdgpu kernel module correctly. See Blacklisting drivers in our wiki.
 
thank you for the reply
Ive attached the output of 'pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""'
and here is the output of 'version -v'
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.7
smartmontools: 7.4-2
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1

and here is the output of "cat /etc/modprobe.d/blacklist.conf"
Code:
cat /etc/modprobe.d/blacklist.conf
blacklist amdgpu
blacklist radeon
blacklist nouveau
blacklist nvidia
 

Attachments