Networking fails after starting VM with GPU-passthrough enabled

mimesot

Well-Known Member
Aug 17, 2017
36
4
48
Hi everyone!
First off, I am new to this forum, so i want to say a big thank you for the developement of this great piece of software and for this community, which already helped me a lot.

My first few experiences with PVE were consitantly positive, but now I am stuck with a problem I neither compredend myself nor do I find any literature on the internet. For my private use I wanted to have my computer run Debian and Win 10 parallel with former using the Intel Xeon E3-1245 IGP and windows given the Nvidia GTX 950 GPU.

I installed PVE 5 on Debian 9 and prepared everything for the Windows VM, sticking tightly to this guide by sshaikh. Everything worked flawlessly so far: Enabling IOMMU, blacklisting nvidia/nouveu drivers, loading VirtIO drivers, installing Windows with the parameters "bios: ovmf machine: q35" set to the <vm>.conf, starting the VM, giving it a static IP, and testing the windows remote desktop in expectation of noVNC not being able to deal with a physical GPU on a VM. Perfect.

Then I passed through the GPU using "hostpci0: 01:00,x-vga=on,pcie=1" and the passhrough actually worked. Plugging a second display to the 950s DVI port gave me a graphical output from the Win 10 VM. BUT... my Network crashed.

Tested: I shut down the Win 10 VM and rebooted PVE. I can surf the internet on my Debian/PVE. Started the Win 10 VM again (with GPU passthrough), network gone. Removed "hostpci0: 01:00,x-vga=on,pcie=1" from the VM and rebooted. Both Debian/PVE and Win 10 VM have Internet. Reenabling GPU passthrough, network gone again.

So please correct me if I am wrong: in the lspci output 01:00 refers to the first PCIe Slot, with 01:00.0 being the Graphics Chip and 01:00.1 being the sound chip on the GTX 950 card.

So what does passing the PCIe Slot mounted devices have to do with the network at all?
Can anybody make sense of this?
Thanks a lot in advance!
mimesot

BTW:
I have no clue what output or other additional information you would need in order to draw conclusions so please tell me.

BTW2:
I currently don't use the onboard Ethernet Chip but one on a PCIe expansion card, because I probably will exchange the mainboard for the one I actually own and didn't want to run into trouble again, when the MAC changes.

System Hardware:
Intel Xeon E3-1245v6 (Kaby Lake with IGP p600)
MSI C236a
Crucial 16GB RAM (ECC with 2133MHz)
Crucial MX300 525GB SSD
Nvidia GTX 950
Ethernet 1000Base-T DeLock 89357 Realtek
USB Expansion card (Which I want to pass through to the Win VM as well)
SATA expansion card (I want to experiment with HBA passthrough)
 
  • Like
Reactions: ONE FOTON
please post the output of

Code:
lspci
find /sys/kernel/iommu_groups -type l | sort -t '/' -n -k 5
qm config <your-VMID>
 
  • Like
Reactions: ONE FOTON
Thanks for your quick response!

lspci
Code:
00:00.0 Host bridge: Intel Corporation Device 5918 (rev 05)
00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05)
00:01.1 PCI bridge: Intel Corporation Skylake PCIe Controller (x8) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 591d (rev 04)
00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #17 (rev f1)
00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 950] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
04:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
05:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01)

find /sys/kernel/iommu_groups -type l | sort -t '/' -n -k 5
Code:
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/6/devices/0000:00:17.0
/sys/kernel/iommu_groups/7/devices/0000:00:1b.0
/sys/kernel/iommu_groups/8/devices/0000:00:1b.3
/sys/kernel/iommu_groups/9/devices/0000:00:1c.0
/sys/kernel/iommu_groups/10/devices/0000:00:1c.4
/sys/kernel/iommu_groups/11/devices/0000:00:1d.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.2
/sys/kernel/iommu_groups/12/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:00:1f.4
/sys/kernel/iommu_groups/13/devices/0000:00:1f.6
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/15/devices/0000:05:00.0
/sys/kernel/iommu_groups/16/devices/0000:06:00.0
/sys/kernel/iommu_groups/17/devices/0000:07:00.0

121.conf
Code:
balloon: 4096
bios: ovmf
boot: c
bootdisk: virtio0
cores: 6
hostpci0: 01:00,x-vga=on,pcie=1
machine: q35
memory: 12288
name: zirkon-vm-win
net0: virtio=9E:76:4B:34:E3:CE,bridge=vmbr0
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=2879b8af-d4a5-48d1-ba07-61130e1fddd4
sockets: 1
virtio0: local:121/vm-121-disk-1.qcow2,cache=writeback,size=256G

I find it interesting that my currently used network adapter "02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)" in the lspci list, who i assume to be the "/sys/kernel/iommu_groups/1/devices/0000:02:00.0" in the iommu list, is in the same group as the graphics card. This could be a coincidence, but I can perhaps rule that out by moving the ethernet-pcie-card to a different pcie slot. I would love to have some more insights to IOMMU.


Thanks and best regards!
mimesot
 
Last edited:
Dear dcsapac!
By asking me to post the IOMMU infos you actually pushed me towards a very convenient workaround. PCIe slot 2 on the mainboard appears to be intended to work as a secondary graphics card slot and is therefore somehow tied to the first PCIe slot (if someone is able to explain it to me, please do so, as i really would like to comprehend this. Is there a way to move slot 2 into a separate group?). I changed the ethernet card to slot 5 and it worked straight away.
Thanks a lot!
mimesot
 
if someone is able to explain it to me, please do so, as i really would like to comprehend this. Is there a way to move slot 2 into a separate group?
how the slots are divided get defined by the hardware manufacturer, in this case i guess this is for sli/crossfire

we have included the 'acs override' patch in our kernel, where you can override those iommu restrictions, but there are (security) implications
more information on how to use this patch can be found in the arch linux wiki:
https://wiki.archlinux.org/index.ph...ing_the_IOMMU_groups_.28ACS_override_patch.29
 
  • Like
Reactions: ONE FOTON
Thanks!
I think I will leave my configuration as is, but nevertheless an interesting read.
Greetings
mimesot
 
  • Like
Reactions: ONE FOTON
please post the output of
hello there
I have the same problem.
Code:
 lspci
00:00.0 Host bridge: Intel Corporation Device 591f (rev 05)
00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 5912 (rev 04)
00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Device a2af
00:14.2 Signal processing controller: Intel Corporation Device a2b1
00:15.0 Signal processing controller: Intel Corporation Device a2e0
00:15.1 Signal processing controller: Intel Corporation Device a2e1
00:16.0 Communication controller: Intel Corporation Device a2ba
00:17.0 SATA controller: Intel Corporation Device a282
00:1c.0 PCI bridge: Intel Corporation Device a294 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0)
00:1e.0 Signal processing controller: Intel Corporation Device a2a7
00:1f.0 ISA bridge: Intel Corporation Device a2c4
00:1f.2 Memory controller: Intel Corporation Device a2a1
00:1f.3 Audio device: Intel Corporation Device a2f0
00:1f.4 SMBus: Intel Corporation Device a2a3
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
02:00.0 USB controller: ASMedia Technology Inc. Device 2142
03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)

Code:
find /sys/kernel/iommu_groups -type l | sort -t '/' -n -k 5
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:15.0
/sys/kernel/iommu_groups/5/devices/0000:00:15.1
/sys/kernel/iommu_groups/6/devices/0000:00:16.0
/sys/kernel/iommu_groups/7/devices/0000:00:17.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.6
/sys/kernel/iommu_groups/10/devices/0000:00:1e.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0                                                         
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/11/devices/0000:00:1f.4
/sys/kernel/iommu_groups/12/devices/0000:00:1f.6
/sys/kernel/iommu_groups/13/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:03:00.0

Code:
vmid.conf
bootdisk: scsi0
cores: 2
cpu: host
ide2: none,media=cdrom
keyboard: es
machine: q35
hostpci0: 01:00.0,pcie=1,x-vga=on
memory: 1024
name: pmining6
net0: virtio=BE:FD:85:A1:48:5D,bridge=vmbr0
numa: 0
ostype: l26
parent: natural
scsi0: local:101/vm-101-disk-1.qcow2,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=e92da254-1ac2-4a30-970c-9eb8d66a8b0b
sockets: 1

[natural]
#amd driver
bootdisk: scsi0
cores: 2
cpu: host
ide2: none,media=cdrom
keyboard: es
machine: q35
memory: 1024
name: pmining6
net0: virtio=BE:FD:85:A1:48:5D,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local:101/vm-101-disk-1.qcow2,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=e92da254-1ac2-4a30-970c-9eb8d66a8b0b
snaptime: 1508249593
sockets: 1
vmstate: local:101/vm-101-state-natural.raw
as you can see I do not have the ethernet network in group 1.
Why does the network break?
 
hey there
can be the problem that the host is trying to use gpu?
Code:
lsmod |grep amd
amdgpu               1556480  0
ttm                    98304  1 amdgpu
drm_kms_helper        151552  2 amdgpu,i915
drm                   352256  5 amdgpu,i915,ttm,drm_kms_helper
i2c_algo_bit           16384  2 amdgpu,i915
 
hey someone please give me an orientation
I have these new data: the guest vm starts but it breaks in seconds.
add /etc/modprobe.d/blacklist.conf
amdgpu
 

Attachments

  • falla amdgpu 2017-10-19 02-49-47.png
    falla amdgpu 2017-10-19 02-49-47.png
    28.1 KB · Views: 16
  • fails amd xfx rx480 8gb 2017-10-19 12-08-01.png
    fails amd xfx rx480 8gb 2017-10-19 12-08-01.png
    86.8 KB · Views: 12
Last edited:
hey hello thousand thanks for the help..
update:
now seems to work, but still do not install drivers or anything. see screenshot
vmid.conf
Code:
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: host
efidisk0: local:103/vm-103-disk-2.qcow2,size=128K
hostpci0: 04:00,pcie=1,x-vga=off
ide2: local:iso/ubuntu-16.04.3-server-amd64.iso,media=cdrom
keyboard: es
machine: q35
memory: 1024
name: mining8
net0: e1000=26:17:9B:46:71:DB,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local:103/vm-103-disk-1.qcow2,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=c54f0a80-a5a7-4e8c-b699-028147de4a2c
sockets: 1
vga: std
but as it turns out the vga is off.
will be correct this?, since in a server that I want to configure for mining of criptocurrency.

I do not want to use it for graphics or games.
Another thing I want to clarify is that the connection to the motherboard I'm doing for 1x pcie, this could cause problems?
 

Attachments

  • funciona 2017-10-19 15-01-20.png
    funciona 2017-10-19 15-01-20.png
    43 KB · Views: 8
hey people
I still have problems with pcie passthrough in proxmox 5 :D
lift a vm with xubuntu and when I add the passthrough pcie and try to start I get the following error
Code:
vfio error: 0000:04:00.0: failed getting region info for VGA region index 8: Invalid argument
device does not support requested feature x-vga
any help is going to be welcome. while I'm still searching
 
good solve the last error by adding in the /etc/pve/qemu-server/vmid.conf hostpci0: 04: 00.1; 04: 00.0
but when I install the amdgpu pro drivers and restart the vm starts, but it does not connect either with spice or with no-vnc and breaks the connection with the host,.
where is the problem?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!