weird GPU passthrough issue

mcdull

Member
Aug 23, 2020
65
2
13
46
I have no problem to get my nvidia to passthrough to windows VM.
my host will boot with proxmox menu and commandline, but as soon as I start the windows VM, it will take over the GPU and the monitor will then boot into windwos.
However, the nvidia seems acting weird with performance. for example, I feel some lag when I move the windows around, which will never happen with native windows.

I have vfio-pci binding in /etc/modprobe.d/ But I am not sure why the proxmox will still be shown on my monitor.

Is it the root cause of the performance issue? I had similar performance issue with gaming, like in microsoft flight similation, which shows huge difference from native and in VM.
 
If the NVIDIA driver loads in Windows and you get a display output, then the passthrough can be considered successful.

Stuttering and lag can have other reasons as well though, i.e. CPU performance, core assignments, host system load, interrupt virtualization, storage devices, RAM usage and assignment, caching, etc...

We'd need more details on your setup to have any chance at getting somewhere (i.e. NVIDIA driver version, VM config, 'pveversion -v', hardware, system load, etc... all you can find).
 
Thank you for your reply.
May I specifically ask if blacklisting the driver has any effect on pass through?

My observation is that if I blacklist nvidiafb, it seems to lower the gpu performance a bit. If I blacklist nouveau and nvidia, passthorugth simply wont work.

And I would also like to know when exactly the display will black out in the booting process for single gpu passthrough.
I was expecting the monitor will go dark once the kernel initialize with the modprobe settings, but my case is that it will keep its output until I start the vm using the gpu. With this I suspect the host had actually occupided some of the resources from the gpu and lower the performance after the vm take over.

Thanks.
 
Driver behaviour is always weird, I've seen multiple different behaviours by now, depending on both config and hardware. You are correct that by blacklisting both the open and closed source NVIDIA drivers you should not see any output after BIOS/GRUB - that is, unless you're booting with legacy BIOS mode and the kernel manages to use the VGA buffer directly, or the UEFI framebuffer in case you're using modern UEFI boot. Sometimes it helps to add video=efifb:off to the kernel command line and add your GPU's PCI addresses to the vfio-pci modprobe config to preload that driver in the initramfs stage already (see here).

With this I suspect the host had actually occupided some of the resources from the gpu and lower the performance after the vm take over.
No, that's not how it works. Assigning a GPU to a VM causes a full device reset (FLR or driver based). This means that the card behaves like the hardware has just booted, i.e. all resources are immediately freed and given exclusively to the guest (in fact, that assignment is why you need an IOMMU in the first place). A guest and a host accessing the same device (without some other hardware feature like SR-IOV) would be a significant attack vector for VM escapes and the like.

That's why I suspect your bottleneck somewhere else - most likely CPU, storage or RAM.
 
Driver behaviour is always weird, I've seen multiple different behaviours by now, depending on both config and hardware. You are correct that by blacklisting both the open and closed source NVIDIA drivers you should not see any output after BIOS/GRUB - that is, unless you're booting with legacy BIOS mode and the kernel manages to use the VGA buffer directly, or the UEFI framebuffer in case you're using modern UEFI boot. Sometimes it helps to add video=efifb:off to the kernel command line and add your GPU's PCI addresses to the vfio-pci modprobe config to preload that driver in the initramfs stage already (see here).


No, that's not how it works. Assigning a GPU to a VM causes a full device reset (FLR or driver based). This means that the card behaves like the hardware has just booted, i.e. all resources are immediately freed and given exclusively to the guest (in fact, that assignment is why you need an IOMMU in the first place). A guest and a host accessing the same device (without some other hardware feature like SR-IOV) would be a significant attack vector for VM escapes and the like.

That's why I suspect your bottleneck somewhere else - most likely CPU, storage or RAM.
Thanks again. Glad to know that the GPU will be fully reset after passthrough. I will try to clean install the client windows to see if it behaves differently. Anyway, the GTX1080 now can handle most gaming even in degraded performance. Except the MS flight simulator, of course.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!