Proxmox crashes when GPU passthrough, Windows 10 VM tries to start up.

Myles124

New Member
Sep 2, 2021
9
1
3
23
I followed this guide on Reddit: https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

Everything was successful, no errors. But then when I went to start the VM, Proxmox stops responding. Website goes down, VMs, can't SSH, it was just a hard crash. VM starts up normally with no GPU attached. Pictures of VM configuration:

1630550663141.png
1630550711154.png
This is a fresh install of Proxmox, I've reinstalled 4 times now. This is the farthest I've ever gotten. Ask me for any information, I just want to solve this!
 
did you use the acs override? if yes, try to disable that and check your iommu groups. while the acs override can help *sometimes* it can make problems like this
 
This command will give a nice overview of the IOMMU groups: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done. Please also show cat /proc/cmdline. Enable Primary GPU with NVidia cards, it often helps.
 
did you use the acs override? if yes, try to disable that and check your iommu groups. while the acs override can help *sometimes* it can make problems like this
I am using acs override. I deleted it, updated grub and rebooted. Issue persists sadly.

Config before: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"

Config after: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt,multifunction nofb nomodeset video=vesafb:off,efifb:off"

That the correct way to remove it?
 
Last edited:
This command will give a nice overview of the IOMMU groups: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done. Please also show cat /proc/cmdline. Enable Primary GPU with NVidia cards, it often helps.
I could see my GPU in the output
IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 GL [Quadro K600] [10de:0ffa] (rev a1) Subsystem: Hewlett-Packard Company GK107GL [Quadro K600] [103c:094b] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GK107 HDMI Audio C ontroller [10de:0e1b] (rev a1)

And this was the output of cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt,multifunction nofb nomodeset video=vesafb:off,efifb:off

I'll try running through the Windows installer without the PCI device passthrough, then I'll enable primary when I have plex on it.
 
IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 GL [Quadro K600] [10de:0ffa] (rev a1) Subsystem: Hewlett-Packard Company GK107GL [Quadro K600] [103c:094b] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GK107 HDMI Audio C ontroller [10de:0e1b] (rev a1)

And this was the output of cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt,multifunction nofb nomodeset video=vesafb:off,efifb:off

I'll try running through the Windows installer without the PCI device passthrough, then I'll enable primary when I have plex on it.
Even with Primary on it crashes.
 
I am using acs override. I deleted it, updated grub and rebooted. Issue persists sadly.

Config before: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"

Config after: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt,multifunction nofb nomodeset video=vesafb:off,efifb:off"

That the correct way to remove it?
You forgot to remove ,multifunction. Also, iommu=pt is not needed, and probably does not do what you think it does. video=vesafb:off,efifb:off is incorrect, it should be video=vesafb:off video=efifb:off". Are you sure you use GRUB and not systemd-boot?

Please show us all IOMMU groups because it is the other devices in the same group that are important (and other groups will give us general information about your system).
 
You forgot to remove ,multifunction. Also, iommu=pt is not needed, and probably does not do what you think it does. video=vesafb:off,efifb:off is incorrect, it should be video=vesafb:off video=efifb:off". Are you sure you use GRUB and not systemd-boot?

Please show us all IOMMU groups because it is the other devices in the same group that are important (and other groups will give us general information about your system).
I'm so new and confused about this. I'm pretty sure I'm using system-md, which means the configuration part is different? Also, how would I check my IOMMU groups?
 
Sorry, I did not realize. You are indeed using GRUB and I only wanted to make sure (which was unnecessary because you don't boot from ZFS). I wanted to point out that there were several mistakes in the GRUB_CMDLINE_LINUX_DEFAULT you showed. Some are harmless and have no effect such as iommu=pt,multifunction. Some contain a very common mistaken (often repeated on the internet) that prevents it from doing what it should do, like video=vesafb:off,efifb:off.

You did indeed disable pcie_acs_overrride, but left some text in there that should also have been removed.
You appear to have run the (long) command I showed earlier, but did not show all output and important information is therefore missing.

Are you sure you don't want to disable the Proxmox console (which nofb nomodeset video=vesafb:off,efifb:off seem to imply)? Do you have a reliable SSH or Proxmox web GUI connection from another computer? Otherwise fixing little mistakes can become much harder.

Please change your GRUB configuration line to GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on nofb nomodeset video=vesafb:off video=efifb:off" (or just GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" if you do want a Proxmox console on the physical machine) and run update-grub before rebooting.

Please run this commmand for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done to show us the IOMMU groups of your system. If the output is too large, put it in a file and attach it or use a spoiler tag.

Please also show the VM configuration file from /etc/pve/qemu-server/ directory.
 
Sorry, I did not realize. You are indeed using GRUB and I only wanted to make sure (which was unnecessary because you don't boot from ZFS). I wanted to point out that there were several mistakes in the GRUB_CMDLINE_LINUX_DEFAULT you showed. Some are harmless and have no effect such as iommu=pt,multifunction. Some contain a very common mistaken (often repeated on the internet) that prevents it from doing what it should do, like video=vesafb:off,efifb:off.

You did indeed disable pcie_acs_overrride, but left some text in there that should also have been removed.
You appear to have run the (long) command I showed earlier, but did not show all output and important information is therefore missing.

Are you sure you don't want to disable the Proxmox console (which nofb nomodeset video=vesafb:off,efifb:off seem to imply)? Do you have a reliable SSH or Proxmox web GUI connection from another computer? Otherwise fixing little mistakes can become much harder.

Please change your GRUB configuration line to GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on nofb nomodeset video=vesafb:off video=efifb:off" (or just GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" if you do want a Proxmox console on the physical machine) and run update-grub before rebooting.

Please run this commmand for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done to show us the IOMMU groups of your system. If the output is too large, put it in a file and attach it or use a spoiler tag.

Please also show the VM configuration file from /etc/pve/qemu-server/ directory.
VM Config:
bios: ovmf boot: order=scsi0;ide2;net0;ide0 cores: 4 efidisk0: Drive1:vm-100-disk-1,size=4M hostpci0: 0000:01:00,pcie=1,x-vga=1 ide0: local:iso/virtio-win-0.1.196.iso,media=cdrom,size=486642K ide2: local:iso/Windows.iso,media=cdrom machine: pc-q35-6.0 memory: 8192 name: Media-Server net0: virtio=52:CE:10:87:35:1E,bridge=vmbr0 numa: 0 ostype: win10 scsi0: Drive1:vm-100-disk-0,cache=writeback,size=1000G scsihw: virtio-scsi-pci smbios1: uuid=ef0d03ff-5ad9-4f24-8739-502f81b9ce40 sockets: 1 vmgenid: 45b00345-7931-419a-8a17-bfa63ec41ff1

IOMMU Groups is in attached file. I really appreciate the help, sorry it took so long to get back to you.
 

Attachments

  • IOMMU.txt
    4.3 KB · Views: 39
No problem, I'm not in a hurry. The IOMMU group 1 looks fine, but it is the only GPU in you system and this makes things more complicated. I don't know if the K600 resets properly (and I'm guessing not, because of the crash of your host). If you cannot add another graphics card to boot from, you'll need to make sure the K600 is not touched by drivers before starting the VM (like the video= parameters you showed before).
Can you try GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on vfio_pci.ids=10de:0ffa,10de:0e1b vfio_pci.disable_vga=1 nofb nomodeset video=vesafb:off video=efifb:off" (on one line) to see if it works better, or at least not crashes the host?
 
Okay, good and bad news. I got the GPU passthrough working! But... Getting code 43 for the GPU. Windows also knows its a VM, is that why I'm getting this error? I've tried installing drivers from Nvida website and Nvida Experience.

1630836294357.png
 
Last edited:
  • Like
Reactions: blackpaw
What version of windows? I have successful passthrough with my W10 VM, but I got a code 43 with W11

We are using the exact same driver version, though my card is a GTX 1070.

Have you disabled the Windows Basic Adapter?
 
1630836276413.png
What version of windows? I have successful passthrough with my W10 VM, but I got a code 43 with W11

We are using the exact same driver version, though my card is a GTX 1070.

Have you disabled the Windows Basic Adapter?
I'm using Windows 10 19043.928. I disabled the Windows Basic Adapter then restarted, and it seems it turned itself back on? Not that I think the adapter has anything to do with error 43, but I could be wrong.
 
Okay, good and bad news. I got the GPU passthrough working! But... Getting code 43 for the GPU. Windows also knows its a VM, is that why I'm getting this error? I've tried installing drivers from Nvida website and Nvida Experience.
I know the internet is full of post about how to hide the fact that your running in a VM when you get Code 43 but I don't think they apply to your case.
Code 43 is a very generic Windows driver error. For consumer GTX cards NVidia's drivers used to give it when running in a VM, but yours is a Quadro (which is allowed to use with virtualization) and the Primary GPU option of Proxmox already takes the necessary steps to hide it from the drivers.
Do you have actual output on a physical display connected to the Quadro K600 (or are the screenshots from RDP)? Did you make any more changes than suggested earlier, or do those configuration still apply? Sorry, I know very little about Windows and NVidia drivers.
 
I know the internet is full of post about how to hide the fact that your running in a VM when you get Code 43 but I don't think they apply to your case.
Code 43 is a very generic Windows driver error. For consumer GTX cards NVidia's drivers used to give it when running in a VM, but yours is a Quadro (which is allowed to use with virtualization) and the Primary GPU option of Proxmox already takes the necessary steps to hide it from the drivers.
Do you have actual output on a physical display connected to the Quadro K600 (or are the screenshots from RDP)? Did you make any more changes than suggested earlier, or do those configuration still apply? Sorry, I know very little about Windows and NVidia drivers.
All screenshots are from Remote Desktop Connection. I have no physical display connected to the GPU.
 
Try changing the q35 machine version to 3.1 (instead of 6.0), which I think is from before the changes to the PCIe layout. Maybe it better matches the expectations of the NVidia Windows drivers. And maybe it requires you to reinstall Windows, but I'm not sure. It did help someone before.

EDIT: I found some NVidia suggestions from Proxmox.
 
Last edited:
I have same problema, but my GPU is two nvidia RTX 3090 em one AMD RX580.

I have 3 GPU and I want to passthrough to my VMS. One GPU is direct connect in PCIExpress 16x, but other two is connect in PCIEXpress 1x via cable riser (like this).

When I connect my RTX 3090 16x on VM, its ok. But when I connect a RXT 3090 or RX 580 via PCIExpress 1x, I lose connection whit my proxmox, until reset it.
I use proxmox 7.
 
Last edited:
I have same problema, but my GPU is two nvidia RTX 3090 em one AMD RX580.

I have 3 GPU and I want to passthrough to my VMS. One GPU is direct connect in PCIExpress 16x, but other two is connect in PCIEXpress 1x via cable riser (like this).

When I connect my RTX 3090 16x on VM, its ok. But when I connect a RXT 3090 or RX 580 via PCIExpress 1x, I lose connection whit my proxmox, until reset it.
I use proxmox 7.
Maybe you can do the same troubleshooting steps? Make sure not to use pcie_acs_override and show us the output of this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done; cat /proc/cmdline.
Most likely the x1 PCIe slots are part of a IOMMU group that also contains the SATA and network controller that you need for the host. Devices in the same IOMMU group cannot be shared between a VM and the host or between VMs. But we will know more once we see your groups.
 
Maybe you can do the same troubleshooting steps? Make sure not to use pcie_acs_override and show us the output of this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done; cat /proc/cmdline.
Most likely the x1 PCIe slots are part of a IOMMU group that also contains the SATA and network controller that you need for the host. Devices in the same IOMMU group cannot be shared between a VM and the host or between VMs. But we will know more once we see your groups.
Hi Avw, thanks for answering.
I followed all the steps in this post and I still couldn't.
I dot use pcie_acs_override in my configuration.
Attached data from the IOMMU group and below data from cat /proc/cmdline:

Code:
BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on
 

Attachments

  • iommu_groups.txt
    8.8 KB · Views: 17

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!