PCIe passthrough

showiproute

Well-Known Member
Mar 11, 2020
615
32
48
36
Austria
Hello everyone,

I have stumbled into an "interessting" issue.
My server currently has two Nvidia 1050 GPUs.

If I passthrough one of them it works fine and without any issues.


If I would likt to use both of them for different VMs only one VM will start while the 2nd one runs into a timeout at start.


Can someone guide me where to start troubleshooting?
I have blacklisted the drivers at Proxmox and IOMMU works pretty straight forward.
 
are the two gpus in different iommu groups? does it work if you reverse them? (i.e use the second in the first vm, etc)
anything in dmesg/syslog/journal?
 
on the gui when you select the card for pci passthrough, there is a column for the group
 
ahhh thanks for clarification!

Okay they are in different groups.

Regarding logs: I have found following entries (not sure if they are valid or not)

VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries

dmesg just contain stuff about networking but also following GPU related entries:
[165724.624674] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[165799.136184] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900


syslog contains mostly similar information as journalctl already does:
Code:
May 14 13:07:07 proxmox1 pvestatd[1705]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
May 14 13:07:08 proxmox1 pvestatd[1705]: status update time (7.067 seconds)
May 14 13:07:17 proxmox1 pvestatd[1705]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
May 14 13:07:17 proxmox1 pvestatd[1705]: status update time (6.781 seconds)
May 14 13:07:18 proxmox1 pvedaemon[11017]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
May 14 13:07:23 proxmox1 pvedaemon[20693]: start failed: command '/usr/bin/kvm -id 111 -name WindowsGaming -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/111.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.s$May 14 13:07:23 proxmox1 pvedaemon[19497]: <root@pam> end task UPID:proxmox1:000050D5:00FCEC28:609E59CD:qmstart:111:root@pam: start failed: command '/usr/bin/kvm -id 111 -name WindowsGaming -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/111.qmp,server,nowait' -mo$May 14 13:07:28 proxmox1 pvestatd[1705]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
May 14 13:07:28 proxmox1 pvestatd[1705]: status update time (6.873 seconds)
May 14 13:07:37 proxmox1 pvedaemon[19497]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
 
VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
while that is not good (it means the host cannot communicate with the qemu process) i am not sure if the problems are related

did you try already with the gpus reversed ?
can you post the full output of 'dmesg' after such a failed attempt?
 
I have tried that right now - same result.
On VM can start with GPU 1 as well 2 while the other VM does not start

`dmesg`
Code:
[156420.344711] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[165692.462158] vmbr0: port 6(tap110i0) entered disabled state
[165721.619199] device tap110i0 entered promiscuous mode
[165721.638232] vmbr0: port 6(tap110i0) entered blocking state
[165721.638233] vmbr0: port 6(tap110i0) entered disabled state
[165721.638496] vmbr0: port 6(tap110i0) entered blocking state
[165721.638497] vmbr0: port 6(tap110i0) entered forwarding state
[165724.624674] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[165751.601047] device tap111i0 entered promiscuous mode
[165751.620568] vmbr0: port 7(tap111i0) entered blocking state
[165751.620570] vmbr0: port 7(tap111i0) entered disabled state
[165751.620871] vmbr0: port 7(tap111i0) entered blocking state
[165751.620872] vmbr0: port 7(tap111i0) entered forwarding state
[165799.136184] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[165800.648709] vmbr0: port 7(tap111i0) entered disabled state
[173232.404709] vmbr0: port 6(tap110i0) entered disabled state
[173293.486048] device tap110i0 entered promiscuous mode
[173293.499215] vmbr0: port 6(tap110i0) entered blocking state
[173293.499216] vmbr0: port 6(tap110i0) entered disabled state
[173293.499501] vmbr0: port 6(tap110i0) entered blocking state
[173293.499502] vmbr0: port 6(tap110i0) entered forwarding state
[173296.492669] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[173431.274138] device tap111i0 entered promiscuous mode
[173431.292473] vmbr0: port 7(tap111i0) entered blocking state
[173431.292475] vmbr0: port 7(tap111i0) entered disabled state
[173431.292776] vmbr0: port 7(tap111i0) entered blocking state
[173431.292777] vmbr0: port 7(tap111i0) entered forwarding state
[173479.268101] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[173480.814474] vmbr0: port 7(tap111i0) entered disabled state

So it's equal to the dmesg before.
 
can you please post both vm configs, as well as some basic info about your machine? (eg. cpu/ram/etc.)
 
sure:

VM1 - Ubuntu 20.04
Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 8
cpu: host
hostpci0: 01:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: WWWServer
net0: virtio=9A:99:0E:53:49:BB,bridge=vmbr0,tag=20
numa: 1
onboot: 1
ostype: l26
scsi0: SSD_NVMe:vm-110-disk-1,discard=on,size=150G,ssd=1
scsi1: Cloud:vm-110-disk-1,discard=on,format=raw,size=1500G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=e6562702-0772-4259-9777-35b1c6aacaef
sockets: 1
vga: qxl
vmgenid: f10edd80-5f10-4aec-b2ae-71222ac4a577


VM2 - Windows Server 2019
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
hostpci0: c1:00,pcie=1,x-vga=1
ide0: NFS_Backup:iso/virtio-win-0.1.190.iso,media=cdrom,size=489986K
ide2: none,media=cdrom
machine: pc-q35-5.2
memory: 16384
name: WindowsGaming
net0: virtio=6E:38:2E:7F:91:60,bridge=vmbr0,tag=10
numa: 1
ostype: win10
scsi0: SSD_500GB:vm-111-disk-0,discard=on,size=100G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=73ca4fa2-7b06-4a3f-8b19-2a93fcfc4206
sockets: 1
tablet: 0
vga: none
vmgenid: 75ccd208-aba7-490f-bf13-1e8b91492a32



The hypervisor itself runs the up-to-date Proxmox community version.

24 x AMD EPYC 7272 12-Core Processor (1 Socket)
128 GB DDR4 ECC RAM (3200 MHz)
ZFS filesystems on mostly consumer SSDs (no regular spinning HDDs are being used)
 
Just a side note: My VMs would consume 1/2 of the available RAM if each VM uses 100 % of the granted RAM
Please be aware that PCI(e) passthrough requires local all VM memory into actual RAM. The PCIe devices can do DMA at any time and ballooning/sharing memory that can be changed without notice is not possible. Therefore in your case, both VMs will use at least 100% of the memory set in their configuration.
 
Please be aware that PCI(e) passthrough requires local all VM memory into actual RAM. The PCIe devices can do DMA at any time and ballooning/sharing memory that can be changed without notice is not possible. Therefore in your case, both VMs will use at least 100% of the memory set in their configuration.
That's not a big deal.
As mentioned if all my VMs which exist on my hypervisor would consume 100 % of the granted RAM it would be ~ 64 GB in sum.

The rest (64 GB) is being used by ZFS as a cache.

So from this point of view there should not be any issue.
 
Found out an interessting fact:

If passing throught my two GPUs to two different OSs (Win 2019 + Ubuntu 20.04) one of those VMs won't start if the other is running.

If passing throught my GPUs to two Ubuntus (20.04 + 21.04) both start and work.
 
Can you please show us your IOMMU groups (with for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done) so we can check whether they are in the same group (and therefore cannot be passed to different VMs)?
 
Sure - I have checked the already.
Here is the output:
Code:
IOMMU group 0 c0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 0 c0:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 0 c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81] (rev a1)
IOMMU group 0 c1:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU group 10 c2:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 11 c3:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 12 c3:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 13 80:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 14 80:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 15 80:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 15 80:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 15 81:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
IOMMU group 16 80:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 17 80:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 18 80:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 19 80:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 1 c0:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 20 80:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 21 80:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 22 80:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 23 80:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 24 82:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 25 82:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 26 83:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 27 83:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 28 84:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 29 85:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 2 c0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 30 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 30 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 30 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81] (rev a1)
IOMMU group 30 01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU group 31 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 32 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 32 00:03.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 32 02:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E12 NVMe Controller [1987:5012] (rev 01)
IOMMU group 33 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 34 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 35 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 36 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 37 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 38 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 39 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
IOMMU group 39 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 3 c0:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 40 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 0 [1022:1490]
IOMMU group 40 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 1 [1022:1491]
IOMMU group 40 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 2 [1022:1492]
IOMMU group 40 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 3 [1022:1493]
IOMMU group 40 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 4 [1022:1494]
IOMMU group 40 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 5 [1022:1495]
IOMMU group 40 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 6 [1022:1496]
IOMMU group 40 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship Device 24; Function 7 [1022:1497]
IOMMU group 41 03:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 42 03:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 43 04:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 44 04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 45 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller [1022:148c]
IOMMU group 46 40:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 46 40:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 46 41:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
IOMMU group 47 40:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 48 40:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 48 40:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 40:03.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 40:03.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 40:03.5 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 40:03.6 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 42:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
IOMMU group 48 43:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142]
IOMMU group 48 44:00.0 PCI bridge [0604]: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150] (rev 04)
IOMMU group 48 45:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 41)
IOMMU group 48 46:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142]
IOMMU group 48 47:00.0 Ethernet controller [0200]: Broadcom Limited BCM57416 NetXtreme-E 10GBase-T RDMA Ethernet Controller [14e4:16d8] (rev 01)
IOMMU group 48 47:00.1 Ethernet controller [0200]: Broadcom Limited BCM57416 NetXtreme-E 10GBase-T RDMA Ethernet Controller [14e4:16d8] (rev 01)
IOMMU group 49 40:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 4 c0:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 50 40:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 51 40:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 52 40:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 53 40:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 54 40:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 55 40:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 56 40:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 57 48:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 58 48:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 59 49:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 5 c0:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 60 49:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU group 61 49:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 62 49:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller [1022:148c]
IOMMU group 63 4a:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 64 4b:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 6 c0:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 7 c0:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 8 c0:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 9 c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
 
In detail the two GPUs would be:

Code:
IOMMU group 0 c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81] (rev a1)
IOMMU group 0 c1:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU group 30 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050] [10de:1c81] (rev a1)
IOMMU group 30 01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
 
Are you using pcie_aces_override (in which case the grouping does not tell us anything)? Or do you have a X570 that does actually split almost everything in separate groups (and which brand, type and BIOS version)? Are there any errors or other information in Syslog or journalctl when you try to start the second VM?
 
Are you using pcie_aces_override (in which case the grouping does not tell us anything)? Or do you have a X570 that does actually split almost everything in separate groups (and which brand, type and BIOS version)? Are there any errors or other information in Syslog or journalctl when you try to start the second VM?

How can I verify if I use pcie_aces_override?

X570 - mainboard?


I am using a Supermicro H12SSL-CT mainboard -> https://www.supermicro.com/en/products/motherboard/H12SSL-CT
 
Check cat /proc/cmdline. Ah right, sorry, I got confused and assumed a consumer Ryzen but you already said EPYC. Any logs that might help?

cmdline -> BOOT_IMAGE=/boot/vmlinuz-5.4.114-1-pve root=/dev/mapper/pve-root ro quiet


Unfortunately the logs do not show any interessting stuff. I have checked journalctl, dmesg, syslog - all just regular "blah blah" stuff.


I am not sure but maybe Proxmox is somehow trying to save ressources via KSM and is having issues?!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!