Issue with adding 2nd pcie gpu

salman8506

Member
Sep 28, 2021
8
0
6
39
Hi,

I am newto proxmox and I have a very strange issue i am dealing with, I have following configuration

R7 1700
64gb ram
ASROCK B450 PRO4
GT610 - for display if required
GTX 1050 - For passthrough
PCIE x1 intel and realtek cards for extra NIC.

Proxmox loads up fine and all vm's power up and everything is normal. As soon as i install second pcie card the system boots up fine and bth gpu's display out properly to connected displays(tested one at a time), Proxmox Cli command watch sensors show both gpu detected and working ok however the only problem is there is absolutely no activity on the lan cards and system is unable to ping or connect anywhere.

Upon booting there is link activity till the point proxmox boots and shows the root cli and post which the activity stops. as soon as i reboot and remove the second pcie card everything starts working fine. I also have a quad nic card that was plugged into this second pcie slot earlier which was working fine as well. Just issue is with using any full size gpu.

Please if someone can assist or point in the right direction for troubleshooting, I am pretty sure this is something related to software and not hardware.
 
Adding a PCI(e) device can shift the PCI(e) device ID-numbers of other devices. Modern Linux systems (such a Debian based Proxmox) give the network devices a name based on the PCI(e) device number. Most likely, your network device got a different name and is therefore not activated because it does not appear in /etc/network/interfaces. Login to the Proxmox console and change /etc/network/interfaces accordingly. You can probably find the new name with journalctl -b 0 | grep renamed, which probably only differs in one digit (off by one) from the name in /etc/network/interfaces.
 
  • Like
Reactions: salman8506
Great reply avw.93939, I just happened upon the same problem, with an arch install a few days ago, and my pve box yesterday.
 
Thanks for the advise, I was able to fix the issue. However my gpu passthrough is still not working. Windows 10 fails to boot when Q35 is selected as machine type as lot of tutorials suggest. it wont even boot from install disc, Simply gives a bsod and reboots.
 
I think a B450 motherboards can only passthrough devices in the first PCIe x16 and the first M.2 slot. Please show us the IOMMU groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done. The groups are determined by the BIOS. Some BIOS versions for AMD Ryzen motherboards break PCI passthrough completely, some have bad groups, some work fine. What does cat /proc/cmdline show? Did you add files or options to the /etc/modprobe.d/ folder? Is there anything in journalctl -b 0 when starting the VM or when it reboots? Can you install Windows without passthrough and add it later? Have you tried machine type Q35 version 3.1 instead of the latest version? Did you enable the Primary GPU setting (very usefull for NVidia, does not work for AMD)? Can you show us the VM configuration file from /etc/pve/qemu-server/?
Sorry for asking a lot, but any of those things could give clues about what might be needed.
 
I think a B450 motherboards can only passthrough devices in the first PCIe x16 and the first M.2 slot. Please show us the IOMMU groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nnks "${d##*/}"; done. The groups are determined by the BIOS. Some BIOS versions for AMD Ryzen motherboards break PCI passthrough completely, some have bad groups, some work fine. What does cat /proc/cmdline show? Did you add files or options to the /etc/modprobe.d/ folder? Is there anything in journalctl -b 0 when starting the VM or when it reboots? Can you install Windows without passthrough and add it later? Have you tried machine type Q35 version 3.1 instead of the latest version? Did you enable the Primary GPU setting (very usefull for NVidia, does not work for AMD)? Can you show us the VM configuration file from /etc/pve/qemu-server/?
Sorry for asking a lot, but any of those things could give clues about what might be needed.
IOMMU Groups -

IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 10 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 11 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
Kernel driver in use: pcieport
IOMMU group 12 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b]
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4, sp5100_tco
IOMMU group 12 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
Subsystem: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e]
IOMMU group 13 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU group 13 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU group 13 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU group 13 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
Kernel driver in use: k10temp
Kernel modules: k10temp
IOMMU group 13 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU group 13 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU group 13 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU group 13 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU group 14 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [GeForce GT 610] [10de:104a] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. GF119 [GeForce GT 610] [19da:6222]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
IOMMU group 15 01:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. GF119 HDMI Audio Controller [19da:6222]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
IOMMU group 16 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
IOMMU group 17 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
Kernel driver in use: ahci
Kernel modules: ahci
IOMMU group 18 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
Kernel driver in use: pcieport
IOMMU group 19 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
Kernel driver in use: pcieport
IOMMU group 20 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 21 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 22 03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 23 03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 24 03:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Kernel driver in use: pcieport
IOMMU group 25 05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Subsystem: Hewlett-Packard Company Ethernet I210-T1 GbE NIC [103c:0003]
Kernel driver in use: igb
Kernel modules: igb
IOMMU group 26 06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter [7470:3468]
Kernel driver in use: r8169
Kernel modules: r8169
IOMMU group 27 08:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
Subsystem: ASRock Incorporation Motherboard [1849:0612]
Kernel driver in use: ahci
Kernel modules: ahci
IOMMU group 28 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
Subsystem: ASRock Incorporation Motherboard (one of many) [1849:8168]
Kernel driver in use: r8169
Kernel modules: r8169
IOMMU group 29 0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81] (rev ff)
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
IOMMU group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
Kernel driver in use: pcieport
IOMMU group 30 0a:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev ff)
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
IOMMU group 31 0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU group 32 0c:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
Kernel driver in use: ccp
Kernel modules: ccp
IOMMU group 33 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:7914]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
IOMMU group 34 0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU group 35 0d:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
Kernel driver in use: ahci
Kernel modules: ahci
IOMMU group 36 0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
Subsystem: ASRock Incorporation Family 17h (Models 00h-0fh) HD Audio Controller [1849:6893]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
IOMMU group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
Kernel driver in use: pcieport
IOMMU group 6 00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
Kernel driver in use: pcieport
IOMMU group 7 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 8 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 9 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
Kernel driver in use: pcieport
Cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.22-4-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on pcie_acs_override=downstream,multifunction video=efifb:eek:ff
/etc/modprobe.d/pve-blacklist.conf
blacklist nvidiafb
blacklist nvidia
blacklist radeon
blacklist nouveau

I built the vm and then did the setting of marking cpu as host, hidden and setting machine type to q35, In the vm i enabled rdc and installed ballon driver service. Passing gtx 1050 to the vm. i have connected gtx 1050 to my tv where i am hoping my kids will be able to game a bit of forza horizon 4 directly, the how part can be worked on post the config actually working.

I have cleaned up and deleted the vm now, will rebuild and respod for "journalctl -b 0" and config of vm.
 
Last edited:
Cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-5.11.22-4-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on pcie_acs_override=downstream,multifunction video=efifb:eek:ff
Unfortunately, using pcie_acs_override makes the IOMMU grouping information useless. Note that amd_iommu=on is not needed, it is on by default. If youi remove quiet, you can see more of the boot process on the console, which can help troubleshooting. If you are passing though the same NVidia GPU that is used during the POST of the motherboard and boot of the Proxmox host (I can't tell from your IOMMU groups), you'll probably need video=vesafb:off as well (separate from video=efifb:off).

blacklist nvidiafb
blacklist nvidia
blacklist radeon
blacklist nouveau
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81]
Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9]
Instead of blacklisting the drivers (which you might need for the other GPU), you can add vfio_pci.ids=10de:1c81,10de:0fb9 to prevent any other drivers on the Proxmox host to touch the NVidia GTX 1050. Then you probably don't need the video= parameters.

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [GeForce GT 610] [10de:104a] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. GF119 [GeForce GT 610] [19da:6222]
Kernel driver in use: vfio-pci
It would appear that you are also doing passthrough on the NVidia GT 610? Is that working fine? Or would you like to use that one for the Proxmox host?

EDIT: I just found that the second PCIe x16 (x4 electrically) slot is shared with the first M.2 slot (connected to the CPU). This means it can be passed through to a VM (without pcie_acs_override) as long a you don't need that M.2 slot.
 
Last edited:
Unfortunately, using pcie_acs_override makes the IOMMU grouping information useless. Note that amd_iommu=on is not needed, it is on by default. If youi remove quiet, you can see more of the boot process on the console, which can help troubleshooting. If you are passing though the same NVidia GPU that is used during the POST of the motherboard and boot of the Proxmox host (I can't tell from your IOMMU groups), you'll probably need video=vesafb:off as well (separate from video=efifb:off).


Instead of blacklisting the drivers (which you might need for the other GPU), you can add vfio_pci.ids=10de:1c81,10de:0fb9 to prevent any other drivers on the Proxmox host to touch the NVidia GTX 1050. Then you probably don't need the video= parameters.


It would appear that you are also doing passthrough on the NVidia GT 610? Is that working fine? Or would you like to use that one for the Proxmox host?

EDIT: I just found that the second PCIe x16 (x4 electrically) slot is shared with the first M.2 slot (connected to the CPU). This means it can be passed through to a VM (without pcie_acs_override) as long a you don't need that M.2 slot.
Thanks for reverting, Let me check all those settings and get back with another try,

I tried the gt610 which was in the second x4 slot thinking that the primary gpu might not be working due to proxmox locking it. It did not work gt the same result :(
 
Thanks for reverting, Let me check all those settings and get back with another try,

I tried the gt610 which was in the second x4 slot thinking that the primary gpu might not be working due to proxmox locking it. It did not work gt the same result :(
The problem is probably that the GPU in the first x16 slot is used by the machine POST/boot/startup of the Proxmox host. Sometimes you can change that in the BIOS, often not. Easiest is to passthrough a GPU that is not touched by anything before starting the VM.
In your case that is: use the x4 slot, use vfio-pci.ids= with the ids of the GPU (and audio) that are in that slot. Don't forget to run update-initramfs -u and update-grub and restart the Proxmox host before starting the VM (don't restart the VM). This way the drivers inside the VM get the card as it is when starting the system, and give you the best change to get it working.
If that works, you can start troubleshooting other stuff like restarting the VM or using the x16 slot (by disabling all output from the host), etc.
 
So just an update, My standy gpu gtx 1050 blew up so i cannot proceed further till i find a replacement :( Will keep everything posted here once i source another one.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!