[SOLVED] Proxmox hangs when starting windows gpu passthrough.

Jeff

New Member
Mar 28, 2022
4
0
1
I'm totally new in setting up a proxmox, I'm having a problem when passing gpu to a Windows 10 VM because proxmox hangs. It works when I'm running it without gpu passthrough. I would like proxmox to use my igpu and passthrough the Gtx 760. Here are my specs.
i7-4790 CPU
asus z97k
Gtx 760
500 gb HDD for proxmox and VM (for test only for now)
Proxmox 7.0

I enabled vt-x and vt-d in the bios

/etc/default/grub

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstre>
GRUB_CMDLINE_LINUX=""

/etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:1187,10de:0e0a disable_vga=1

VM :

Code:
agent: 0
balloon: 1024
bios: ovmf
boot: order=scsi0;net0;sata2
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-100-disk-2,size=4M
hostpci0: 0000:01:00,pcie=1,x-vga=1
machine: pc-q35-6.0
memory: 10240
name: Windows10
net0: virtio=E2:79:37:AE:5C:99,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata2: local:iso/virtio-win-0.1.215.iso,media=cdrom,size=528322K
scsi0: local-lvm:vm-100-disk-0,cache=writeback,size=120G
scsihw: virtio-scsi-pci
smbios1: uuid=7efde89e-3aad-4b6e-9b28-57e8dc7aeeda
sockets: 1
vga: none

IOMMU group

Code:
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06)
IOMMU group 10 00:1c.3 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev d0)
IOMMU group 11 00:1d.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 [8086:8ca6]
IOMMU group 12 00:1f.0 ISA bridge [0601]: Intel Corporation Z97 Chipset LPC Controller [8086:8cc4]
IOMMU group 12 00:1f.2 SATA controller [0106]: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] [8086:8c82]
IOMMU group 12 00:1f.3 SMBus [0c05]: Intel Corporation 9 Series Chipset Family SMBus Controller [8086:8ca2]
IOMMU group 13 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104 [GeForce GTX 760] [10de:1187] (rev a1)
IOMMU group 14 01:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
IOMMU group 15 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 11)
IOMMU group 16 04:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
IOMMU group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06)
IOMMU group 3 00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB xHCI Controller [8086:8cb1]
IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation 9 Series Chipset Family ME Interface #1 [8086:8cba]
IOMMU group 6 00:1a.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 [8086:8cad]
IOMMU group 7 00:1b.0 Audio device [0403]: Intel Corporation 9 Series Chipset Family HD Audio Controller [8086:8ca0]
IOMMU group 8 00:1c.0 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 [8086:8c90] (rev d0)
IOMMU group 9 00:1c.2 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 [8086:8c94] (rev d0)
 
This can't be right: pcie_acs_override=downstre>. If you are using pcie_acs_override then your IOMMU groups are lying, because that is what pcie_acs_override does: it breaks the real groups. Please don't use pcie_acs_override and check the groups again, because the symptoms of Proxmox freezing can be explained when the Ethernet and SATA controller are in the same IOMMU group as your GPU.
Also make sure to boot with the integrate Intel graphics. And note that ballooning won't work with passthrough because the PCI(e) devices can access all memory.
 
This can't be right: pcie_acs_override=downstre>. If you are using pcie_acs_override then your IOMMU groups are lying, because that is what pcie_acs_override does: it breaks the real groups. Please don't use pcie_acs_override and check the groups again, because the symptoms of Proxmox freezing can be explained when the Ethernet and SATA controller are in the same IOMMU group as your GPU.
Also make sure to boot with the integrate Intel graphics. And note that ballooning won't work with passthrough because the PCI(e) devices can access all memory.
Thanks for your reply, I already set the Primary Display to CPU Graphics in the bios. Removed ballooning and pcie_acs_override=downstream and here are the IOMMU group results.

IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06)
IOMMU group 10 00:1c.3 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev d0)
IOMMU group 10 04:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
IOMMU group 11 00:1d.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 [8086:8ca6]
IOMMU group 12 00:1f.0 ISA bridge [0601]: Intel Corporation Z97 Chipset LPC Controller [8086:8cc4]
IOMMU group 12 00:1f.2 SATA controller [0106]: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] [8086:8c82]
IOMMU group 12 00:1f.3 SMBus [0c05]: Intel Corporation 9 Series Chipset Family SMBus Controller [8086:8ca2]
IOMMU group 13 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 11)
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104 [GeForce GTX 760] [10de:1187] (rev a1)
IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
IOMMU group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06)
IOMMU group 3 00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB xHCI Controller [8086:8cb1]
IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation 9 Series Chipset Family ME Interface #1 [8086:8cba]
IOMMU group 6 00:1a.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 [8086:8cad]
IOMMU group 7 00:1b.0 Audio device [0403]: Intel Corporation 9 Series Chipset Family HD Audio Controller [8086:8ca0]
IOMMU group 8 00:1c.0 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 [8086:8c90] (rev d0)
IOMMU group 9 00:1c.2 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 [8086:8c94] (rev d0)

The Gtx 760 and audio device was isolated only in group 1. But it still freeze when running the VM. The monitor screen that is connected to the host turns to black and led light is blinking.
 
Is the monitor screen connected to the GTX 760 or the integrated CPU graphics? I don't see a reason why the Proxmox host freezes. I can understand that the VM freezes because maybe the GTX 760 does not work with passthrough (maybe needs a patches BIOS or something?).
 
Is the monitor screen connected to the GTX 760 or the integrated CPU graphics? I don't see a reason why the Proxmox host freezes. I can understand that the VM freezes because maybe the GTX 760 does not work with passthrough (maybe needs a patches BIOS or something?).
Apparently that was the causes of freezing. I've connected my monitor to the GTX 760. I've read that the host will use the external GPU when the monitor is connected to it even if you set primary display to CPU. So I plugged it first to the onboard GPU and run proxmox then unplug and connect it to the GTX 760 and start VM to see if its booting. I'm able to passthrough the GPU but can't install the GeForce Experience. Please see the attached image.
 

Attachments

  • unable to install GF experience.png
    unable to install GF experience.png
    651.3 KB · Views: 19
Good, passthrough is working: the GPU device shows up in the VM. Maybe you can connect the same monitor with two cables to both GTX 760 and integrated graphics, so it will always boot from the integrated graphics? I cannot help you with NVidia or Windows software, sorry, maybe someone else knows about that.
 
  • Like
Reactions: Jeff
Good, passthrough is working: the GPU device shows up in the VM. Maybe you can connect the same monitor with two cables to both GTX 760 and integrated graphics, so it will always boot from the integrated graphics? I cannot help you with NVidia or Windows software, sorry, maybe someone else knows about that.
Its ok, you helped me a lot. I think its because nvidia is dropping out support for some old models. I've downloaded from their archives and its now installed. Maybe may last question before I mark this as resolved is it normal that the node is reporting almost all of my memory being used? But in the vm itself its only using around 21%?
 

Attachments

  • memory used.png
    memory used.png
    94.2 KB · Views: 6
Its ok, you helped me a lot. I think its because nvidia is dropping out support for some old models. I've downloaded from their archives and its now installed. Maybe may last question before I mark this as resolved is it normal that the node is reporting almost all of my memory being used? But in the vm itself its only using around 21%?
When using passthrough, all the VM memory must be pinned into actual RAM because PCI(e) devices can access all the VM memory at any time using Direct Memory Access. Therefore, ballooning and KSM also won't do anything for such VMs.
Besides that, operating systems inside a VM usually don't count filesystem cache as used memory because it can be freed quickly when needed. However, the Proxmox host does not know how the memory is used and cannot reuse it (without cooperation from within the VM, like balooning), and therefore counts the memory as used (unavailable to the host or other VMs).