Single GPU issue

toyotahead

New Member
Nov 28, 2022
4
0
1
I have a system where there is no graphics in the cpu, only the pci-e gpu add in card. I am having problems successfully using the gpu with a VM. I am hoping I have missing something silly, and someone here can point me in the right direction. (If there is any required but missing information below please let me know.)
The system boots to the kernel selection menu immediately after posting. Once it starts loading the kernel the display goes black and I do not see the rest of the booting process nor the console. I only get to see the kernel selection that is it. I understand this is expected behaviour. I also understand I loose the ability to access the local console. That said the monitor does not turn off (monitor stays awake), just all black. Now with the proxmox GUI accessable I go and start the VM. When I do so there is a very quick flicker on the monitor, then the monitor turns off. From this point forward the monitor will not stay on with the VM running or not.

- I am not sure if I have fully disabled proxmox from using this gpu which may be causing this issue?
- Other forums I have read through suggest I need to specify a modified GPU ROM file. (I have the unmodified GPU's rom file thanks to GPU-Z if required.)
- I have tried specifying the GPU with and without primary gpu options with the same results.



Proxmox version: 7.3-3
Kernel: 5.15.74-1-pve

System Specs:
- Intel Core i5-9400f
- Gigabyte B360m Aorus
- Zotac GTX-970 (ids=10de:13c2,10de:0fbb)
- Booting proxmox with UEFI


System Configs below....

cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=efifb:off vfio_pci.ids=10de:13c2,10de:0fbb nofb nomodeset video=vesafb:off initcall_blacklist=sysfb_init

cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb

cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb

cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1

cat /etc/pve/nodes/omega/qemu-server/50000.conf
#WinDOZE 11 64Bit
#
#- QEMU agent installed
#- All VirtIO drivers installed
#- NVidia GTX970 Drivers installed w/NVidia_Experience
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=ide2;virtio0
cores: 4
cpu: host,hidden=1,flags=+pcid
efidisk0: local-zfs:vm-50000-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1
ide2: none,media=cdrom
machine: pc-q35-7.0
memory: 4096
meta: creation-qemu=7.0.0,ctime=1665791723
name: Windoze11
net0: virtio=DA:96:34:F6:A1:09,bridge=vmbr2,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=84566f99-dc35-4531-a053-da2f36ab045d
sockets: 1
tpmstate0: local-zfs:vm-50000-disk-1,size=1M,version=v2.0
usb0: host=413c:2501,usb3=1
vga: none
virtio0: local-zfs:vm-50000-disk-2,size=120G
vmgenid: a1dd0f77-5e94-47a4-a3da-fd091136282e


for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done
#
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec2] (rev 07)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd 8th Gen Core Processor Host Bridge/DRAM Registers [1458:5000]
Kernel driver in use: skl_uncore
Kernel modules: ie31200_edac
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
Kernel driver in use: pcieport
IOMMU group 2 00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH Thermal Controller [1458:8888]
Kernel driver in use: intel_pch_thermal
Kernel modules: intel_pch_thermal
IOMMU group 3 00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH USB 3.1 xHCI Host Controller [1458:5007]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
IOMMU group 3 00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
DeviceName: Onboard - Other
Subsystem: Intel Corporation Cannon Lake PCH Shared SRAM [8086:7270]
IOMMU group 4 00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH HECI Controller [1458:1c3a]
Kernel driver in use: mei_me
Kernel modules: mei_me
IOMMU group 5 00:17.0 SATA controller [0106]: Intel Corporation Cannon Lake PCH SATA AHCI Controller [8086:a352] (rev 10)
DeviceName: Onboard - SATA
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH SATA AHCI Controller [1458:b005]
Kernel driver in use: ahci
Kernel modules: ahci
IOMMU group 6 00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 [8086:a330] (rev f0)
Kernel driver in use: pcieport
IOMMU group 7 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:a308] (rev 10)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Device [1458:5001]
IOMMU group 7 00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
DeviceName: Onboard - Sound
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH cAVS [1458:a182]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_sof_pci_intel_cnl
IOMMU group 7 00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH SMBus Controller [1458:5001]
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
IOMMU group 7 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
DeviceName: Onboard - Other
Subsystem: Intel Corporation Cannon Lake PCH SPI Controller [8086:7270]
IOMMU group 7 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-V [8086:15bc] (rev 10)
DeviceName: Onboard - Ethernet
Subsystem: Gigabyte Technology Co., Ltd Ethernet Connection (7) I219-V [1458:e000]
Kernel driver in use: e1000e
Kernel modules: e1000e
IOMMU group 8 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] [19da:1366]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
IOMMU group 9 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. GM204 High Definition Audio Controller [19da:1366]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

lspci -n -s 01:00
01:00.0 0300: 10de:13c2 (rev a1)
01:00.1 0403: 10de:0fbb (rev a1)
 
Hope you manage to get that set up. About 4 months ago I tried proxmox on an amd pc and passthrough to my gpu card so I could install a windows10 vm for blue iris.
Wasted too much time on it and ended up messing a load of things up so wiped it and went back to windows.
All your configs and arguments looked similar to mine also.
 
  • Like
Reactions: toyotahead
cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=efifb:eek:ff vfio_pci.ids=10de:13c2,10de:0fbb nofb nomodeset video=vesafb:eek:ff initcall_blacklist=sysfb_init
I think you only need root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on initcall_blacklist=sysfb_init.
Do you really need pcie_acs_override=...? Check your IOMMU groups without it to check and remove it if you can.
cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
Do you need unsafe interrupts?
cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
Do you need ignore MSRs?
cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
I don't think you really need this. Maybe softdep nouveau pre: vfio-pci softdep snd_hda_Intel pre: vfio-pci is enough (check with lspci -ks 01:00 after a reboot and before starting the VM). Doesn't really hurt anything, might as well leave it like this.
cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
If there numbers match lspci -nns 01:00 then it's correct, which they appear to do.
cat /etc/pve/nodes/omega/qemu-server/50000.conf
#WinDOZE 11 64Bit
#
#- QEMU agent installed
#- All VirtIO drivers installed
#- NVidia GTX970 Drivers installed w/NVidia_Experience
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
I don't think you need args.
bios: ovmf
You could try switching to SeaBIOS if your GPU does not support UEFI, but you'll need to reinstall Windows. And I don't have experience with NVidia to tell you if it will work or help at all.
boot: order=ide2;virtio0
cores: 4
cpu: host,hidden=1,flags=+pcid
efidisk0: local-zfs:vm-50000-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1
For consumer NVidia cards use hostpci0: 0000:01:00,pcie=1,x-vga=1 to enable Primary GPU.
Maybe you need ,romfile=... as well, but I don't have experience with NVidia (because the blocked passthrough) to know how to get the romfile if you only have a single GPU. I also don't know how to patch it, if necessary for UEFI/OVMF or whether that is necessary for your GPU or your Windows drivers. This you need to find out from someone/somewhere else.
ide2: none,media=cdrom
machine: pc-q35-7.0
memory: 4096
meta: creation-qemu=7.0.0,ctime=1665791723
Maybe older machine versions work better for your older GPU? I really don't have any experience with NVidia Windows drivers.
name: Windoze11
net0: virtio=DA:96:34:F6:A1:09,bridge=vmbr2,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=84566f99-dc35-4531-a053-da2f36ab045d
sockets: 1
tpmstate0: local-zfs:vm-50000-disk-1,size=1M,version=v2.0
usb0: host=413c:2501,usb3=1
vga: none
virtio0: local-zfs:vm-50000-disk-2,size=120G
vmgenid: a1dd0f77-5e94-47a4-a3da-fd091136282e
You installed VirtIO drives, which is good (but unrelated to passthrough).
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done
...
This information is useless because you used pcie_acs_override=....
 
Last edited:
I think you only need root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on initcall_blacklist=sysfb_init.
Do you really need pcie_acs_override=...? Check your IOMMU groups without it to check and remove it if you can.
I have updated the boot parameters to your suggestion: No change. :/

Do you need unsafe interrupts?


Do you need ignore MSRs?
No change with either of these disabled. :/

I don't think you really need this. Maybe softdep nouveau pre: vfio-pci softdep snd_hda_Intel pre: vfio-pci is enough (check with lspci -ks 01:00 after a reboot and before starting the VM). Doesn't really hurt anything, might as well leave it like this.
I am not totally sure what you are asking me to do here. Do you want me to add softdep...... and snd_hda..... into the /etc/modprobe.d/blacklist.conf file? If so, there was no change. :/

output of lspci -ks 01:00 after a reboot but before trying to start the VM:
lspci -ks 01:00 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau 01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 High Definition Audio Controller Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel

I don't think you need args.
No change. :/

You could try switching to SeaBIOS if your GPU does not support UEFI, but you'll need to reinstall Windows. And I don't have experience with NVidia to tell you if it will work or help at all.
Im hoping to avoid this if possible. Also this video card worked in a different proxmox setup where the has the cpu has an integrated gpu in UEFI. My fingers are crossed that it should work here too.

For consumer NVidia cards use hostpci0: 0000:01:00,pcie=1,x-vga=1 to enable Primary GPU.
Maybe you need ,romfile=... as well, but I don't have experience with NVidia (because the blocked passthrough) to know how to get the romfile if you only have a single GPU. I also don't know how to patch it, if necessary for UEFI/OVMF or whether that is necessary for your GPU or your Windows drivers. This you need to find out from someone/somewhere else.
I have tried this both with x-vga=1 as well as not with no improvement. As for the ROM file, I have pulled the ROM out of the GPU with GPU-Z in a different computer. But of course the ROM file is factory unaltered. I have tried specifying the factory ROM file within the vm.conf file also with no improvement.

Maybe older machine versions work better for your older GPU? I really don't have any experience with NVidia Windows drivers.
I am not sure what you are asking me to try here.





I see listed here that there is a driver and modules loaded for the GPU and GPU Audio. Is this correct if I want to be able to passthrou this device? Wouldn't this indicate that the host OS (proxmox) still has exclusive access over them?
IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] [19da:1366] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 High Definition Audio Controller [19da:1366] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
 
Last edited:
I have updated the boot parameters to your suggestion: No change. :/

No change with either of these disabled. :/

I am not totally sure what you are asking me to do here. Do you want me to add softdep...... and snd_hda..... into the /etc/modprobe.d/blacklist.conf file? If so, there was no change. :/
You don't have to change it, I was just suggesting some clean-up. The softdeps are more specific than just blacklisting but the important thing is that vfio-pci gets the GPU before those drivers.
output of lspci -ks 01:00 after a reboot but before trying to start the VM:
lspci -ks 01:00 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau 01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 High Definition Audio Controller Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
Looks like passthrough is setup correctly on the Proxmox side of things.
No change. :/
It's just clean-up as this is already included in Primary VGA (x-vga=1).
Im hoping to avoid this if possible. Also this video card worked in a different proxmox setup where the has the cpu has an integrated gpu in UEFI. My fingers are crossed that it should work here too.
Unfortunately, single GPU passthrough is more complicated than passthrough of a GPU that is not touched by anything during boot. Make sure to match that other Proxmox as close as possible (see also the machine type below) and use a copy (restore from backup) of the working VM.
Looks like it's a GTX 970 specific issue and you'll need to find out how other people got their single GTX 970 working with which version of the NVidia drivers, on that version of Windows, using which VM virtual BIOS and machine version settings.
I have tried this both with x-vga=1 as well as not with no improvement. As for the ROM file, I have pulled the ROM out of the GPU with GPU-Z in a different computer. But of course the ROM file is factory unaltered. I have tried specifying the factory ROM file within the vm.conf file also with no improvement.
Patching of ROM files seems to be necessary sometimes but I don't have the knowledge (related to my remark above).
I am not sure what you are asking me to try here.
Different QEMU machine versions have different virtual PCIe layouts, which might break NVidia's drivers. See VM settings > Hardware > Machine. Try matching the version that worked for you before on another Proxmox. This might require reinstalling Windows, I don't know.
I see listed here that there is a driver and modules loaded for the GPU and GPU Audio. Is this correct if I want to be able to passthrou this device? Wouldn't this indicate that the host OS (proxmox) still has exclusive access over them?
IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] [19da:1366] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) Subsystem: ZOTAC International (MCO) Ltd. GM204 High Definition Audio Controller [19da:1366] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
This looks good: driver in use is vfio-pci. Proxmox, the VM hypervisor, needs "exclusively access" to the devices for passthrough (with vfio-pci).

I can't help you further with NVidia and Windows, sorry. Hopefully someone with NVidia experience will find this thread or you find the information you need on this forum or another.
 
Thanks leestekem for your guidance and your time explaining things to me.



You don't have to change it, I was just suggesting some clean-up. The softdeps are more specific than just blacklisting but the important thing is that vfio-pci gets the GPU before those drivers
I have not used the softdeps before. I am not sure how to impliment them. Do they get put into the /etc/modprobe.d/blacklist.conf file like this?

blacklist radeon blacklist nouveau blacklist nvidia blacklist nvidiafb softdep nouveau pre: vfio-pci softdep snd_hda_Intel pre: vfio-pci



Unfortunately, single GPU passthrough is more complicated than passthrough of a GPU that is not touched by anything during boot. Make sure to match that other Proxmox as close as possible (see also the machine type below) and use a copy (restore from backup) of the working VM.
Looks like it's a GTX 970 specific issue and you'll need to find out how other people got their single GTX 970 working with which version of the NVidia drivers, on that version of Windows, using which VM virtual BIOS and machine version settings.
This window 11 vm is a restore that had been running on a previous (now dead) gagabyte mb H77N-WIFI, and a Intel xeon 4 core processor (socket 1155). Sorry off the top of my head I don't recall the exact model number of the processor. But this older processor had a built in gpu which I previously use for the local console. Everything previously had been working beautifully. This is also why the windows 11 container has the VIFO drivers installed.

The way I built the new systems was near identical to that of the old system with the exception of:
- vfio_pci.ids=10de:13c2,10de:0fbb nofb nomodeset video=vesafb:off initcall_blacklist=sysfb_init (in the /etc/kernel/cmdline file)

Sadly the new processor (Intel Core i5-9400f socket 1151) does not have a built in gpu and causing all the difficulties. I'm thinking the quickest way to resolve the issue is just to get a CPU with a built in GPU as the newer MB does have provisions to support a processor with a built in GPU.
 
I have not used the softdeps before. I am not sure how to impliment them. Do they get put into the /etc/modprobe.d/blacklist.conf file like this?

blacklist radeon blacklist nouveau blacklist nvidia blacklist nvidiafb softdep nouveau pre: vfio-pci softdep snd_hda_Intel pre: vfio-pci
No, you can put them in any file. Maybe better to put then in with your /etc/modprobe.d/vfio.conf. They are meant to replace your blacklisting. If this is too complicated, don't worry about it. It won't change anything for your actual problem.
This window 11 vm is a restore that had been running on a previous (now dead) gagabyte mb H77N-WIFI, and a Intel xeon 4 core processor (socket 1155). Sorry off the top of my head I don't recall the exact model number of the processor. But this older processor had a built in gpu which I previously use for the local console. Everything previously had been working beautifully. This is also why the windows 11 container has the VIFO drivers installed.

The way I built the new systems was near identical to that of the old system with the exception of:
- vfio_pci.ids=10de:13c2,10de:0fbb nofb nomodeset video=vesafb:eek:ff initcall_blacklist=sysfb_init (in the /etc/kernel/cmdline file)

Sadly the new processor (Intel Core i5-9400f socket 1151) does not have a built in gpu and causing all the difficulties. I'm thinking the quickest way to resolve the issue is just to get a CPU with a built in GPU as the newer MB does have provisions to support a processor with a built in GPU.
Maybe, I don't know.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!