I'm trying to pass-through two different GPUs, to two different VMs, at the same time.
What's really peculiar is that I can passthrough both GPUs to *one* VM fine. But if I passthrough to two different guest VMs, I get an instant crash of the machine.
I've ensured BIOS set to Gen 3 PCI. IOMMU groups show the GPUs in different groups.
If it were a power issues presumably that wouldn't be anything to do if it were one VM or two. So I'm really scratching my head. There's nothing to go on in the logs - and just completely and instantly kills the host OS.
One thing I did try was enabling 'AER' in the BIOS, although I'm not exactly sure what that does. When I turned that on my guest logs are spammed with correction logs:
IOMMU groups:
As there's literally nothing in syslog/dmesg, I'm really not sure where to start. I can only guess it's a Proxmox kernel/KVM issue if passing through both GPUs to a single VM works, when passing through to different ones doesn't
What's really peculiar is that I can passthrough both GPUs to *one* VM fine. But if I passthrough to two different guest VMs, I get an instant crash of the machine.
I've ensured BIOS set to Gen 3 PCI. IOMMU groups show the GPUs in different groups.
If it were a power issues presumably that wouldn't be anything to do if it were one VM or two. So I'm really scratching my head. There's nothing to go on in the logs - and just completely and instantly kills the host OS.
One thing I did try was enabling 'AER' in the BIOS, although I'm not exactly sure what that does. When I turned that on my guest logs are spammed with correction logs:
Code:
[ 2268.015030] pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:07:00.0
[ 2268.015037] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 2268.015041] pcieport 0000:00:03.1: AER: device [1022:1483] error status/mask=00001000/00004000
[ 2268.015045] pcieport 0000:00:03.1: AER: [12] Timeout
[ 2268.015051] vfio-pci 0000:07:00.1: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 2268.015055] vfio-pci 0000:07:00.1: AER: device [10de:1aef] error status/mask=00000001/00000000
IOMMU groups:
Code:
IOMMU Group 18:
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1)
07:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
IOMMU Group 19:
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP108 [10de:1d01] (rev a1)
08:00.1 Audio device [0403]: NVIDIA Corporation GP108 High Definition Audio Controller [10de:0fb8] (rev a1)
As there's literally nothing in syslog/dmesg, I'm really not sure where to start. I can only guess it's a Proxmox kernel/KVM issue if passing through both GPUs to a single VM works, when passing through to different ones doesn't