Hello,
I have PVE 7.0 running on a Gigabyte H310N motherboard with intel i5-8400. The IOMMU setup was done following the wiki.
The motherboard has an M.2 slot for wifi card, and I have a Coral TPU installed on it.
The IOMMU group looks good:
The Coral TPU is located in Group 13. So I pass through PCI device 04:00.0 from Group 13 in the Proxmox GUI. Then I turn on the guest running Ubuntu 20.04 LTS.
But I get internal error on the guest, it won't boot.
Syslog shows:
I suppose that 00.1d.3 is the PCI bus on this M.2 interface. But I don't understand why it failed.
The guest runs fine without PCI passthrough. It only gets the "internal error" when the PCI passthrough is set.
I have two VMs running with USB device passthrough. Not sure whether this is related but I guess it's not.
I have PVE 7.0 running on a Gigabyte H310N motherboard with intel i5-8400. The IOMMU setup was done following the wiki.
The motherboard has an M.2 slot for wifi card, and I have a Coral TPU installed on it.
The IOMMU group looks good:
Code:
IOMMU Group 0:
00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec2] (rev 07)
IOMMU Group 1:
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
01:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 2:
00:02.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:3e92]
IOMMU Group 3:
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
IOMMU Group 4:
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
IOMMU Group 5:
00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
IOMMU Group 6:
00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282]
IOMMU Group 7:
00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #5 [8086:a294] (rev f0)
IOMMU Group 8:
00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #11 [8086:a29a] (rev f0)
IOMMU Group 9:
00:1d.3 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #12 [8086:a29b] (rev f0)
IOMMU Group 10:
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:a2ca]
00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]
IOMMU Group 11:
02:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. SM2263EN/SM2263XT SSD Controller [126f:2263] (rev 03)
IOMMU Group 12:
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 16)
IOMMU Group 13:
04:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
The Coral TPU is located in Group 13. So I pass through PCI device 04:00.0 from Group 13 in the Proxmox GUI. Then I turn on the guest running Ubuntu 20.04 LTS.
But I get internal error on the guest, it won't boot.
Syslog shows:
Code:
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.3
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: device [8086:a29b] error status/mask=00100000/00010000
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: [20] UnsupReq (First)
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: AER: TLP Header: 34000000 04000010 00000000 00000000
Nov 06 22:12:22 Proxmox kernel: pcieport 0000:00:1d.3: AER: device recovery successful
Nov 06 22:12:22 Proxmox kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x110
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.3
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: device [8086:a29b] error status/mask=00100000/00010000
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: [20] UnsupReq (First)
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: AER: TLP Header: 34000000 04000010 00000000 00000000
Nov 06 22:12:23 Proxmox kernel: pcieport 0000:00:1d.3: AER: device recovery successful
Nov 06 22:12:23 Proxmox QEMU[7423]: kvm: vfio_err_notifier_handler(0000:04:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
I suppose that 00.1d.3 is the PCI bus on this M.2 interface. But I don't understand why it failed.
The guest runs fine without PCI passthrough. It only gets the "internal error" when the PCI passthrough is set.
I have two VMs running with USB device passthrough. Not sure whether this is related but I guess it's not.