[SOLVED] Pcie NIC down when start a vm that has a gpu passthrough

maonimeia

New Member
Apr 2, 2021
2
0
1
31
I changed a mother board for the pve host. Everything is fine.
But when I start the VM who has a gpu passthrough, the PCIE NIC is down.
Here are some syslogs:
Code:
Apr 03 02:52:27 pve pvedaemon[1380]: <root@pam> starting task UPID:pve:00000C72:0000725A:606767EB:qmstart:103:root@pam:
Apr 03 02:52:27 pve kernel: igb 0000:02:00.0: removed PHC on enp2s0f0
Apr 03 02:52:27 pve kernel: vmbr1: port 1(enp2s0f0) entered disabled state
Apr 03 02:52:27 pve kernel: device enp2s0f0 left promiscuous mode
Apr 03 02:52:27 pve kernel: vmbr1: port 1(enp2s0f0) entered disabled state
Apr 03 02:52:27 pve kernel: igb 0000:02:00.1: removed PHC on enp2s0f1
Apr 03 02:52:27 pve kernel: vmbr2: port 1(enp2s0f1) entered disabled state
Apr 03 02:52:27 pve kernel: device enp2s0f1 left promiscuous mode
Apr 03 02:52:27 pve kernel: vmbr2: port 1(enp2s0f1) entered disabled state

Here are the vm conf:
Code:
agent: 1,type=virtio
balloon: 0
bios: ovmf
boot: order=sata0
cores: 8
cpu: host,hidden=1
efidisk0: local-lvm:vm-103-disk-0,size=4M
localtime: 0
machine: q35
memory: 16384
name: windows
net0: virtio=1A:89:61:5A:F2:48,bridge=vmbr2
numa: 0
ostype: l26
sata0: local-lvm:vm-103-disk-1,size=50G,ssd=1
sata1: local-lvm:vm-103-disk-2,backup=0,size=80G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=3b8052c0-7ffe-45a9-ab2c-1719468d054c
sockets: 1
startup: order=1
usb0: host=264a:3030,usb3=1
usb1: host=045e:02e6,usb3=1
usb2: host=0a12:0001,usb3=1
vmgenid: 64cc1cd8-4248-4ca6-a151-ed3ddacc4a96

Here are the pcie infomation:
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Device 1b4c:120f
        Flags: fast devsel, IRQ 11
        Memory at de000000 (32-bit, non-prefetchable) [disabled] [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [disabled] [size=32M]
        I/O ports at e000 [disabled] [size=128]
        Expansion ROM at df000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

01:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
        Subsystem: Device 1b4c:120f
        Flags: bus master, fast devsel, latency 0, IRQ 10
        Memory at df080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter I350-T2
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at df200000 (32-bit, non-prefetchable) [size=1M]
        Memory at df304000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number a0-36-9f-ff-ff-8b-1e-b0
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1a0] Transaction Processing Hints
        Capabilities: [1c0] Latency Tolerance Reporting
        Capabilities: [1d0] Access Control Services
        Kernel driver in use: igb
        Kernel modules: igb

02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter I350-T2
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at df100000 (32-bit, non-prefetchable) [size=1M]
        Memory at df300000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number a0-36-9f-ff-ff-8b-1e-b0
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1a0] Transaction Processing Hints
        Capabilities: [1d0] Access Control Services
        Kernel driver in use: igb
        Kernel modules: igb
 
I changed a pcie slot for nic, and it works.
They were probably in the same IOMMU group (which cannot be split over host and/or multiple VMs). Yuo can check with for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done