[SOLVED] Issue with OVMF GPU Passthrough (specs and details in post)

TheFunk

Member
Oct 25, 2016
35
4
8
31
Hi all!

I'm just now taking my first real foray into Proxmox and I'm loving it so far. But I could use some help. Some of the technologies at play here are still a little foreign to me. I apologize if I'm a little wordy here.

First some details:

Version: Proxmox VE 4.4
Hardware: Supermicro X10DAi with E5-2683's and a Sapphire AMD Radeon RX 480 Nitro+

I've spun up my first node and I have it running a few VMs. I'm solely on the pve-no-subscription repository, however I might be convinced to move to a subscription repository if that will make my life easier. I'd read that some of the issues I'm having could be resolved by adding the pve-test repository (albeit that information was rather old, I imagine those software have been merged into the stable repository by now.)



----- The issue -----
I'm in the process of passing through my GPU to a couple different VMs for testing purposes (I only power on one of these test VMs at a time) and I'm finding a few issues that I figured someone here would know more about than I would.

I have made sure IOMMU and VT-D are both enabled and running. I checked and my GPU has a UEFI compatible ROM and my VMs are using the OVMF UEFI BIOS. I made sure all the proper modules are in /etc/modules as well.


----- Windows 7 x64 Pro Guest w/ Passthrough -----
I am able to start a VM with the AMD card as the video adapter using the following options:

machine: q35
hostpci0: 02:00,pcie=1​

Using these options a Windows 7 guest will boot and recognize the card in device manager. I am able to use the noVNC connection from the "onboard" default adapter and install my Crimson driver for the card...sorta...we'll get to that.

If I add x-vga=on to that config as recommended in the docs, the machine appears to start, but has no noVNC connection. I was unable to ping the guest machine for verification that it was running, even though all indicators in the web panel say the machine was powered on. Does this option just disable the "onboard" graphics? If so, is there any way I can remotely connect to the machine?

For the record, I am able to disable the "onboard" graphics on the guest machine from device manager and the machine will still boot appropriately if the Crimson driver has been installed.

Lastly in my Windows 7 guest, the GPU is giving an odd error in device manager. It says, "Not enough system resources, disable some stuff" or something like that. It has the exclamation mark of caution. Trying to do anything graphics intensive results in poor performance or the applications will simply crash.

Am I missing something obvious?

------ Windows 10 Guest w/ Passthrough -------

Same as before I can use the options above but not x-vga=on.

In a Windows 10 guest using those options I get the dreaded atikmdag.sys error and a bluescreen as the AMD Crimson display driver is being installed. I'll include a crashdump when I can. My guess is that this issue isn't so much related to the passthrough as it is the AMD drivers and Windows 10. I've had this exact same model card fail on me when installing the AMD drivers on a regular desktop before.

----- One interesting bit -----

If you've made it this far, first thank you. Second, here is the one area I'm not following best practice. This card is the sole GPU device in the host system. I don't boot into an X environment or anything like that on the host server though so I don't believe the system should have a hold on the card. All the possible host drivers are blacklisted.

Screenshots, logs, config files and further info to come as I test out ideas. Thank you all for any help you can provide. I look forward to spending a decent amount of time here.
 
  • Like
Reactions: ONE FOTON
are you sure the card supports uefi properly? i have a little older card here (r7 250) which does not have a uefi compatible bios.
i could passthrough without efi , and for efi i had to search a firmware romfile with support for uefi and use it
(copy it to /usr/share/kvm/ and add a "romfile=<romfile.rom>" to the hostpci line, where <romfile.rom> is the name of the file

alternatively, you can try without efi, and see if it works this way

also you do have the radeon/amdgpu blacklisted on the host, and verified that the card is in its own iommu group?

the x-vga part disables the virtual gpu, but also tells qemu to use this device as a graphics device

edit: typo
 
Hi dcsapak! Thanks for the reply.

The ROM I'm using is here. It says it has UEFI support. This ROM is both flashed to the card and I've tried using it as a ROM file.

I'll test the GPU with x-vga=on using SeaBIOS and report back later on. I believe when I originally tested it using SeaBIOS I had the same problem.

I have blacklisted the radeon driver on the host and the card is in its own IOMMU group.

I'm aiming to wrestle with it again later tonight.​
 
also if you do not have anything important on the machine, you could test with the pve5.0 beta, this has a newer kernel and qemu 2.9, but caution it is a beta and not free of bugs ;)
 
I actually thought to try that last night. Everything appears to be working well enough but no luck on it fixing the passthrough. Working the same as it was unfortunately. I have an extra GPU laying around, I'm going to try dropping a different one into the 1st PCIE slot of my host and then try passing through the RX 480 and see if that has any effect. Ideally I'd like to be able to pass through the 1st slot in addition to the others but if I can't I'll just chalk that up as a loss.
 
Further troubleshooting before I try the second card. I used the qm monitor to look up the information about the guest pci devices. Lo and behold the card is being passed through and recognized as a VGA controller when I enable the x-vga=on option.

Bus 1, device 0, function 0:
Audio controller: PCI device 1002:aaf0
IRQ 10.
BAR0: 64 bit memory at 0x90640000 [0x90643fff].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id "hostpci0.0"
Bus 1, device 0, function 1:
VGA controller: PCI device 1002:67df
IRQ 10.
BAR0: 64 bit prefetchable memory at 0x800000000 [0x80fffffff].
BAR2: 64 bit prefetchable memory at 0x810000000 [0x8101fffff].
BAR4: I/O at 0xa000 [0xa0ff].
BAR5: 32 bit memory at 0x90600000 [0x9063ffff].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
id "hostpci0.1"​
 
Curioser and curioser. Here's what I've got now:

I disabled Windows Firewall on the guest and checked to make sure remote desktop would allow connections and now I can remotely access the machines after power on when the x-vga=on option is set. So I know for a fact the machines are powering on.

Both guests shows an Acer DynaVivid graphics dock and a Standard VGA Graphics Adapter under removable devices.....where in the world did it get that information???

So then I added the second video card to the host machine and I noticed something else a little odd.

SR-IOV was disabled for my PCI devices in BIOS and all of the devices were listed as Legacy as opposed to UEFI. I changed both of these settings when adding the card....

Still no dice.

I'm going to keep going. I will play Crysis on this VM ;)
 
Sorry to spam the post but I figure if I do resolve this issue, I'll have some decent info to add to the wiki. Here's something interesting. Someone over on the redhat mailing list has had this exact same issue. The recommendation there was to take multifunction graphics cards and pass their individual functions through to the guest separately. In my case doing so restored the GPU to its previous state of saying (in the guest OS' device manager) that it needs more resources in order to function.
 
SR-IOV was disabled for my PCI devices in BIOS and all of the devices were listed as Legacy as opposed to UEFI. I changed both of these settings when adding the card....

are you sure you boot via uefi?
 
I am not certain, but I will check. I'm pretty sure I do. When I mentioned all of the devices saying Legacy instead of UEFI I was talking about my PCIE devices...still that is a little odd. I should check that.

I've been actively working on finding a solution. I am down to a few potential issues.

When I booted my server with the 2 GPUs installed, in addition to a 3rd card, my IB NIC, I received a message that said "Not enough PCI resources" in my host machine's BIOS. I googled the error and the Supermicro site said to enable Above 4G encoding in the PCI section of my BIOS. So I did and I haven't gotten that error on the host machine since.

I have noticed differing issues based on OS and passthrough type. Each one makes me believe something may be wrong with my graphics card.

I tried passing through the card using SeaBIOS, same not enough resources message in the Windows 7 guest.

I spun up a Windows 8.1 guest, went through the whole driver install, and I get a black screen at the end of the driver install.
 
Another update...

I feel very stupid. My Proxmox installation was using the legacy BIOS. I'm not sure if this is important or not but now my host is running the 5.0 beta and is booting using UEFI instead of BIOS.

Another interesting thing that I was hoping someone would be more of an expert on here...there is a setting in the Supermicro BIOS that says Device OPROM. I'm wondering what this does. This setting then can be set for each device connected to the system via PCI slot. The options are Disabled, Legacy, and EFI. If I choose EFI on the slot that correlates to my GPU and save the BIOS settings and reset, the machine will restart and give a beep error (5 beeps) indicating that there is no GPU present, from there I can't tell if the host is booting, but my KVM doesn't display any output. The card works appropriately when this option rom setting is set to Legacy.

One last thing, I had to clear the BIOS by removing the CMOS battery when that error code came up. Now I also get errors if I enable Above 4G Encoding or SR-IOV.

I've started the RMA process for the card in case that's the issue but I wonder more and more every day whether this Supermicro board is just a little wonky. Have you ever heard of an X10DAi-C? As far as the Internet can tell me, I'm the only one in the world with a X10DAi-C.
 
Hello all! Reporting in once again!

I can confirm that the card was part of the original problem. I RMA'ed it and got a replacement in the mail today. I installed the new card and checked to make sure my host machine (now running Proxmox 5.0 Beta 2) was booting via UEFI. All good in the neighborhood.

I booted up one of my EFI Win 10 guests and passed through the card, including the x-vga=on option in the conf file.

With x-vga=on set, I am not able to see any output from the VM on my rackmount KVM. However, if I remote desktop into the guest, it is up and running! For some reason the card once again passed through as an Acer Dynavivid Graphics Dock and a standard VGA adapter. Both show up as removable devices. I have no idea why. I will try troubleshooting this again.

I was able to install the graphics driver on the Windows 10 VM without the machine crashing. This was a first.

I'm still unsure about the OPROM options in my host machine's BIOS.

I am not using a romfile at this time.

So tomorrow the tests will be passing through just the graphics function and not the audio function on the GPU and determining if the Dynavivid issue is still there, followed by switching the guest machine to use the GPU romfile.
 
Hey everyone!

Reporting back in with good news!

I was able to get my Sapphire RX 480 Nitro+ OC 8GB to pass on over to the other side! Sounds morbid, I know!

Here's everything I did, start to finish, for anyone who may be trying to pass through a 480 on Proxmox or similar OS and having problems!

First and foremost, make sure you're UEFI booting to your OS. If I'd spent the time to do this originally, I could have saved myself a few hours of troubleshooting. Thanks to dcsapak for making me check!

Next up, "dmesg | tail" on your host is probably the single most useful debugging command ever.

Third, the EFI OPROM setting for my card in the Supermicro BIOS was necessary! This would not under any circumstance work until I RMA'd the card. If for any reason you're using a Supermicro board and under the PCI slot settings you can't select EFI OPROM for your RX 480, and you know the card has an EFI VBIOS; RMA IT, IT'S BROKEN. First try with my replacement card I was able to switch to EFI and it worked.

Fourth, do NOT rely on remote connections like RDP. You've got to hook up a monitor and keyboard and mouse and pass all of that through. Remote Desktop was flaky. VNC doesn't work if you set x-vga=on, and you need to set this option in your guest config in order to pass through the card appropriately.

Next, I highly recommend using the romfile option. I passed through the card a million different ways and the only way to get the card to appear correctly was to point the guest to a saved ROM. For whatever reason, using the card's own ROM causes troubles.

Use the recommended PCI passthrough method. PCIE sometimes causes issues.

Make sure efifb is not running against your card! dmesg | tail will give you a BAR 3 or a BAR 0 message in most cases if this is happening.

I found that since my motherboard does not have an onboard GPU, I had to install a different GPU to use as the system's primary before efifb would leave my card alone, even with GRUB kernel options set.

Lastly, Pass through your card functions separately. For whatever reason my GPU hated being automatically passed through all together. The graphics controller and the audio controller had to be passed through as separate PCI devices. Every time I tried passing them through together, the system saw my GPU as an Acer Dynavivid graphics dock. I have no idea why, but as I mentioned above, the issue has been around for a while and was talked about over at the RedHat forums.

That's all for now folks, now I just need to improve a few more things, but that's for a different post!
 
Great that you worked it out :)
 
Hey everyone!

Reporting back in with good news!

I was able to get my Sapphire RX 480 Nitro+ OC 8GB to pass on over to the other side! Sounds morbid, I know!

Here's everything I did, start to finish, for anyone who may be trying to pass through a 480 on Proxmox or similar OS and having problems!

First and foremost, make sure you're UEFI booting to your OS. If I'd spent the time to do this originally, I could have saved myself a few hours of troubleshooting. Thanks to dcsapak for making me check!

Next up, "dmesg | tail" on your host is probably the single most useful debugging command ever.

Third, the EFI OPROM setting for my card in the Supermicro BIOS was necessary! This would not under any circumstance work until I RMA'd the card. If for any reason you're using a Supermicro board and under the PCI slot settings you can't select EFI OPROM for your RX 480, and you know the card has an EFI VBIOS; RMA IT, IT'S BROKEN. First try with my replacement card I was able to switch to EFI and it worked.

Fourth, do NOT rely on remote connections like RDP. You've got to hook up a monitor and keyboard and mouse and pass all of that through. Remote Desktop was flaky. VNC doesn't work if you set x-vga=on, and you need to set this option in your guest config in order to pass through the card appropriately.

Next, I highly recommend using the romfile option. I passed through the card a million different ways and the only way to get the card to appear correctly was to point the guest to a saved ROM. For whatever reason, using the card's own ROM causes troubles.

Use the recommended PCI passthrough method. PCIE sometimes causes issues.

Make sure efifb is not running against your card! dmesg | tail will give you a BAR 3 or a BAR 0 message in most cases if this is happening.

I found that since my motherboard does not have an onboard GPU, I had to install a different GPU to use as the system's primary before efifb would leave my card alone, even with GRUB kernel options set.

Lastly, Pass through your card functions separately. For whatever reason my GPU hated being automatically passed through all together. The graphics controller and the audio controller had to be passed through as separate PCI devices. Every time I tried passing them through together, the system saw my GPU as an Acer Dynavivid graphics dock. I have no idea why, but as I mentioned above, the issue has been around for a while and was talked about over at the RedHat forums.

That's all for now folks, now I just need to improve a few more things, but that's for a different post!
HI,

You mean do we need to enable UEFI BIOS on Host hardware ? We are also trying to passthrough the GPU Nvidia GTX 1050 Ti. But when we try to install the drivers / do OS (Win10) update we are getting BSOD error. Could please help us to enable UEFI / OVMF BIOS for our Win10 VM.
Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!