[SOLVED] Registering PCIe Device makes Network Unresponsive

teelai

New Member
Oct 24, 2024
8
0
1
I am on Proxmox VE 8.2.7, with a R5 430 GPU.

I followed https://forum.proxmox.com/threads/p...x-ve-8-installation-and-configuration.130218/ to set up the GPU and drivers, and everything looks good.

However, when I assign the GPU as a PCIe device to my Alpine Linux VM, network access stops working from the machine, a snippet of the logs:
Code:
Jun 19 11:30:48 proxmox kernel: vfio-pci 0000:06:00.0: enabling device (0002 -> 0003)
Jun 19 11:30:48 proxmox kernel: vfio-pci 0000:06:00.1: enabling device (0000 -> 0002)
Jun 19 11:30:49 proxmox pvedaemon[1502]: <root@pam> end task UPID:proxmox:00002735:00015463:6853E6D3:qmstart:102:root@pam: OK
Jun 19 11:30:57 proxmox pvestatd[1454]: backups: error fetching datastores - 500 Can't connect to 192.168.0.72:8007 (Connection timed out)

I was able to restart the machine and unassign the PCIe card from the VM to regain access.

I'm not sure what could be causing this, does anyone have any pointers?
 
Last edited:
I'm guessing something else is in the same IOMMU group. Please share yours
Bash:
#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;
The VM config will also be helpful. Use qm config VMIDHERE --current to get it.
 
Last edited:
  • Like
Reactions: teelai
Code:
IOMMU Group 0:
-e      00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 1:
-e      00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 2:
-e      00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 3:
-e      00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4:
-e      00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 5:
-e      00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 6:
-e      00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7:
-e      00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8:
-e      00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 9:
-e      00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10:
-e      00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 11:
-e      00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
-e      00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12:
-e      00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
-e      00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
-e      00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
-e      00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
-e      00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
-e      00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
-e      00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
-e      00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 13:
-e      01:00.0 Non-Volatile memory controller [0108]: Seagate Technology PLC FireCuda 530 SSD [1bb1:5018] (rev 01)
IOMMU Group 14:
-e      02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1022:43d5] (rev 01)
-e      02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
-e      02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
-e      03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
-e      03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
-e      03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
-e      05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
-e      06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R5 430 OEM / R7 240/340 / Radeon 520 OEM] [1002:6611] (rev 87)
-e      06:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1002:aab0]
IOMMU Group 15:
-e      07:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
IOMMU Group 16:
-e      08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU Group 17:
-e      08:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device [1022:1456]
IOMMU Group 18:
-e      08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU Group 19:
-e      09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU Group 20:
-e      09:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 21:
-e      09:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

Code:
agent: 1
bios: seabios
boot: order=scsi0;ide2;net0
cores: 6
cpu: x86-64-v2-AES
ide2: local:iso/alpine-virt-3.21.2-x86_64.iso,media=cdrom,size=63M
machine: q35
memory: 20480
meta: creation-qemu=9.0.2,ctime=1736966594
name: docker
net0: virtio=BC:24:11:64:17:8B,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-102-disk-0,aio=threads,iothread=1,size=256G
scsihw: virtio-scsi-single
smbios1: uuid=cf8b68ab-9cf9-4d1a-b380-7d8dc7c44c4f
sockets: 1
startup: order=2
vga: virtio
vmgenid: 49488429-9ad2-4689-8d26-8cd7d13acaa6

This is without the PCIe card, but it was device 0000:06:00.0, with all functions, ROM Bar, and PCI Express ticked.
 
You can see that the GPU shares a IOMMU group with the NIC and many other things. This will not work. You can try using another PCI(e) slot and/or see if there's IOMMU/ACS related options in the UEFI you can enable. Check again afterwards to see how the groups changed.
 
Last edited:
  • Like
Reactions: teelai
Unfortunately there aren't any other PCI slots I can use, and I don't think there are any other options in my BIOS. Do I have any other options for HW transcode in a VM?
 
I did find an ACS option in my BIOS, but my groups seem unchanged, is there another step? I was reluctant as checking the BIOS is a bit of a pain for my set-up.

I am not setting up a new VM with GPU passthrough, I'm trying to add it to an existing VM and I'd rather not recreate it from scratch as a CT.
 
After doing some digging, although I have enabled ACS, I think my Ryzen 1700 doesn't support it. I think I need at least a 3000 series CPU, so that would explain the lack of IOMMU isolation.

Thanks so much Impact for all the help.
 
After doing some digging, although I have enabled ACS, I think my Ryzen 1700 doesn't support it. I think I need at least a 3000 series CPU, so that would explain the lack of IOMMU isolation.
It's not the CPU but the chipset. Only the X570(S) chipset supports every PCI(e) slot and on-board device in a separate IOMMU group. All other AM4 chipset put everything into one big "chipset group". Except for the first and sometimes second x16 slot (X370 & X470 only) and the first M.2 slot (all). This has been discussed many times on this forum (while AM4 was the latest AMD socket).
You can also search for pci_acs_override=downstream,multifunction which breaks the IOMMU groups but that means that VMs can read the memory of other VMs and/or even the Proxmox host.
 
  • Like
Reactions: teelai
Ah okay, thank you! I was about to look around on eBay for something slightly more recent but it sounds like I would need to do a mobo upgrade if I wanted to get around this without the override.

Thank you leesteken! I'll weigh up whether the override is worth it in my case.
 
Last edited:
I would expect the first x16 PCIe slot to be in its own IOMMU group (and also the first M.2 slot) counting from clostest to the CPU. Can't you put the GPU in that slot? Or if that does not help then maybe update the motherboard BIOS?
I currently have a HBA in that slot, and the HBA doesn't fit in the second slot because of my case, if I remember correctly.