GPU Passthrough : code 12

Henon

Active Member
Apr 4, 2019
6
1
43
43
Hello everyone,

I would like to set up a GPU passthrough on a Proxmox server with Windows VMs, following the proxmox wiki (https://pve.proxmox.com/wiki/Pci_passthrough#GPU_Passthrough). Two graphic cards (Titan RTX A40) are availables and will be used inside two different VM.

Everything seems to work properly and I am able to see the graphic card inside the VM but with the error "This device cannot find enough free that it can use (Code 12)".

I configured the Proxmox server as follow :

/etc/default/grub : (I tried a lot of different configurations)
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt video=efifb:off video=vesa:off"
I have updated grub and rebooted the server.

Code:
dmesg | grep -e DMAR -e IOMMU
[    6.064564] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064617] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064647] pci 0000:20:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064679] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064719] pci 0000:e0:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064760] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064804] pci 0000:a0:00.2: AMD-Vi: IOMMU performance counters supported
[    6.064851] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[    6.082871] pci 0000:60:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082883] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082890] pci 0000:20:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082897] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082903] pci 0000:e0:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082909] pci 0000:c0:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082916] pci 0000:a0:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.082922] pci 0000:80:00.2: AMD-Vi: Found IOMMU cap 0x40
[    6.120179] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    6.120269] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    6.120361] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    6.120455] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
[    6.120550] perf/amd_iommu: Detected AMD IOMMU #4 (2 banks, 4 counters/bank).
[    6.120649] perf/amd_iommu: Detected AMD IOMMU #5 (2 banks, 4 counters/bank).
[    6.120750] perf/amd_iommu: Detected AMD IOMMU #6 (2 banks, 4 counters/bank).
[    6.120834] perf/amd_iommu: Detected AMD IOMMU #7 (2 banks, 4 counters/bank).

IOMMU is enabled (not auto) on BIOS. Strange thing, when amd_iommu=on is set, I don't have "DMAR: IOMMU enabled" as you can see above but when I try intel_iommu=on, IOMMU is enabled in dmesg. The CPU is an AMD EPYC, so I don't understand why amd_iommu parameter doesn't show IOMMU enabled and intel_iommu does.

Modules are added :
Code:
lsmod | grep ^vfio
vfio_pci               57344  1
vfio_virqfd            16384  1 vfio_pci
vfio_iommu_type1       40960  1
vfio                   36864  5 vfio_iommu_type1,vfio_pci

Interrupt remapping is enabled :
Code:
dmesg | grep 'remapping'
[    6.082928] AMD-Vi: Interrupt remapping enabled

The two graphic cards are isolated in IOMMU groups :
Code:
IOMMU Group 24:
    41:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [RTX A40] [10de:2235] (rev a1)
IOMMU Group 108:
    a1:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [RTX A40] [10de:2235] (rev a1)


/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=10de:2235 disable_vga=1

Drivers are blacklisted :
/etc/modprobe.d/blacklist.conf
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia-gpu


The Windows VM is configured as follow :
Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 8
efidisk0: tank:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:41:00,pcie=1,x-vga=1
machine: pc-q35-6.1
memory: 16384
meta: creation-qemu=6.1.0,ctime=1653398297
name: gpu
net0: virtio=C6:69:19:96:9B:BF,bridge=vmbr0
numa: 0
ostype: win10
scsi0: local-lvm:vm-101-disk-0,cache=writeback,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=e47c6ba2-9135-409b-8ead-2e659b31ee47
sockets: 1
vga: none
vmgenid: c0314fed-30cc-4d87-a3ad-973feac05d4d

When I try to check if the graphic card is OVMF compatible, echo 1 > rom returns Permission denied (like in this post https://forum.proxmox.com/threads/unable-to-dump-my-video-card-vbios.100041/)

A monitor is connected to each graphic card and I can connect to the VM via RDP.


I also tried with a Linux VM and obtained this error message with the nvidia-smi command : "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

I don't understand why I can't use the GPU while it is seen into the VM. If someone can help me, I would be grateful.
 
did you enable above 4g decoding in the bios? AFAICS the card has 48GB of memory, which may require some tweaking regarding such things
 
Unfortunately, there is no option for enabling above 4g decoding despite updating the BIOS. I changed PCI 64bit resource allocation from auto to enable but it seems that it's not enough.
 
2 Henon: are you sure that enables IOMMU in BIOS? On my Gigabyte G292-Z40 with Ryzen CPU I have same problem(intel_iommu=on) until enable the IOMMU in BIOS/UEFI
 
2 Henon: are you sure that enables IOMMU in BIOS? On my Gigabyte G292-Z40 with Ryzen CPU I have same problem(intel_iommu=on) until enable the IOMMU in BIOS/UEFI

Yes IOMMU is enabled in BIOS.
The motherboard is Lenovo ThinkSystem SR665 if it can help.

bios.png
 
Happened to my Dell R7525 + Dual A30 too....it also doesn't have "Above 4G Decoding" option in Bios.



This what works for me :

in grub config, in GRUB_CMDLINE_LINUX, add "pci=realloc" option.
and add this args param in VM :

qm set VMID -args '-global q35-pci host.pci-hole64-size=2048G'
 
Thank you so much for this help, those parameters works for me too !

The exact syntax I have to use is : qm set VMID -args '-global q35-pcihost.pci-hole64-size=2048G' (no space between pci and host).
 
  • Like
Reactions: Dsf
hi all,

I cant get around this code 12. I run a similar setup, Epyc and 2x A40s (gigabyte MB).
On linux I dont see any problems but on windows I get the code 12. IOMMU is ON etc. (I have followed the reddit passthrough guide etc).

My grub line is: "quiet amd_iommu=on pci=realloc pcie_acs_override=downstream,multifunction video=efifb:off video=vesa:off vfio-pci.ids=10de:2235 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=radeon,nouveau,nvidia,nvidiafb,nvidia-gpu"

And I added the "qm set VMID -args '-global q35-pcihost.pci-hole64-size=2048G'" on the Windows 11 VM...

Any other ideas?

Linux: (running a demo pytorch)
Screenshot 2023-03-14 at 13.25.13.png

Windows 11:
Screenshot 2023-03-14 at 12.55.04.png
Screenshot 2023-03-14 at 12.54.56.png
 
just for future reference: Even though my IOMMU, 4G decoding and SV-IOR etc setting where ON (Gigabyte Epyc Milan chassis) I have to turn ON other random settings (honestly not sure...) to get it working...

(I know, not the best troubleshooting but at some point I lost track)
 
just for future reference: Even though my IOMMU, 4G decoding and SV-IOR etc setting where ON (Gigabyte Epyc Milan chassis) I have to turn ON other random settings (honestly not sure...) to get it working...

(I know, not the best troubleshooting but at some point I lost track)
I have the exact problem with Epyc Milan CPU. Did you really just turn on random settings..? It would be a great help if you could recall which settings you turned on :) Thanks
 
I have the exact problem with Epyc Milan CPU. Did you really just turn on random settings..? It would be a great help if you could recall which settings you turned on :) Thanks
hmm... I honestly dont remember. I was pissed and I edited setting without thinking much... I dont think gigabyte servers have a way to export in text the bios config?! no?? I have two gigabyte server with passthough working. The one above was a problem with A40. my other system has 8xA30 and its works great without problems.

Whats the specs of your system?
 
hmm... I honestly dont remember. I was pissed and I edited setting without thinking much... I dont think gigabyte servers have a way to export in text the bios config?! no?? I have two gigabyte server with passthough working. The one above was a problem with A40. my other system has 8xA30 and its works great without problems.

Whats the specs of your system?
Hey @daemonix, thanks for the reply. I literally ended up giving up GPU passthrough. Specs are ASUS servers with EPYC 7763 and NVIDIA A100. Was really having a hard time because it was hard to find cases like mine where I want to passthrough enterprise HW to Windows.
 
Hey @daemonix, thanks for the reply. I literally ended up giving up GPU passthrough. Specs are ASUS servers with EPYC 7763 and NVIDIA A100. Was really having a hard time because it was hard to find cases like mine where I want to passthrough enterprise HW to Windows.
Im puzzled with your problem. I have moved to some ASUS servers too. dual 7763 too. I havent added any GPU yet but I might do in a couple of weeks. All my boxes at the moment work with bios defaults + IOMMU (4g decoding is default I think). I _DO NO_T have vGPU or other slicing/licensing so I always pass full hardware to VMs.

PS! Make sure you have the right ASUS SKU that can handle 280W cooling! I was given multiple wrong variants... both ASUS and Gigabyte... Most multi GPU users dont need 7763s too....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!