Multi-socket PCIe passthrough devices disappearing

lejunx

New Member
Nov 26, 2024
4
0
1
I'm working with a Dell Precision 7920, all virtualization flags are enabled in the BIOS. The proxmox host is saturn, the virtual guest is jupiter. Sorry for the long post, I'm trying to get as much pertinent data in as possible.

1732641679261.png

tl;dr: multi-socket system has u.2 NVMe PCIe drives resource mapped to a guest, half (one socket worth) disappear from host when VM guest is booted.


root@saturn:~# numactl -s
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
cpubind: 0 1
nodebind: 0 1
membind: 0 1
preferred:


I am passing through four u.2 NVMe drives - in this host, each CPU socket supports 2x u.2. I also have 2x m.2 NVMe drives, which are booting the host.

root@saturn:~# lspci | grep -i beta
10000:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10000:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10002:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10002:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

Desired NVMe
/devices/pci0000:d1/0000:d1:05.5/pci10002:00/10002:00:01.0/10002:02:00.0/nvme/nvme5
/devices/pci0000:44/0000:44:05.5/pci10000:00/10000:00:02.0/10000:01:00.0/nvme/nvme0
/devices/pci0000:44/0000:44:05.5/pci10000:00/10000:00:03.0/10000:02:00.0/nvme/nvme2
/devices/pci0000:d1/0000:d1:05.5/pci10002:00/10002:00:00.0/10002:01:00.0/nvme/nvme4

I am also passing through the built in SATA controller:

/devices/pci0000:00/0000:00:17.0

I have created resource mappings for the NVMe RAID Controller (which I do not intend to use) as well as the PCI Express Root ports, and the NVME devices:
1732640041746.png

1732640063084.png

I have added these mapped devices to my VM, which is configured with 70 cores, host cpu passthrough and numa enabled:

1732640137248.png

When I boot the VM, half of the NVMe drives disappear from the mapping, and are not visible to the virtual guest or the host:

1732640289381.png
1732640341251.png
1732640405267.png
 
Last edited:
Is the problem that I'm including multiple IOMMU groups in one mapping, should I make two mappings?

edit: no.

1732641050108.png
1732641022990.png

This is new:
1732641749465.png


These drives have been working perfectly in a TrueNAS setup for nearly a year.

No issues, ever.
 
Last edited:
I decided to do a raw PCI mapping, and scale down to 24 cores, also used a suggestion from here to use qm set to bind memory and CPU's to the vm:

root@saturn:~# qm set 100 --numa0 cpus=0-11,hostnodes=0,memory=65535,policy=bind
update VM 100: -numa0 cpus=0-11,hostnodes=0,memory=65535,policy=bind
root@saturn:~# qm set 100 --numa1 cpus=12-23,hostnodes=1,memory=65535,policy=bind
update VM 100: -numa1 cpus=12-23,hostnodes=1,memory=65535,policy=bind

1732665932187.png
This has resulted in a similar situation, with half of the NVMe drives disappearing:

1732653291224.png



root@saturn:~# lspci | grep -i beta
10000:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10000:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
root@saturn:~#
 

Attachments

  • 1732653324769.png
    1732653324769.png
    35.4 KB · Views: 0
  • 1732653329523.png
    1732653329523.png
    35.4 KB · Views: 0
  • 1732653343245.png
    1732653343245.png
    35.4 KB · Views: 0
  • 1732653350911.png
    1732653350911.png
    35.4 KB · Views: 1
  • 1732653516148.png
    1732653516148.png
    82.6 KB · Views: 5
Last edited:
So I thought to myself, perhaps this is an issue where I need to configure vfio. I poked around, and found this article which provides an elegant method using driverctl:

For example:

driverctl -v set-override 10002:02:00.0 vfio-pci
driverctl: setting driver override for 10002:02:00.0: vfio-pci
driverctl: loading driver vfio-pci
driverctl: unbinding previous driver vfio-pci
driverctl: reprobing driver for 10002:02:00.0
driverctl: saving driver override for 10002:02:00.0

after running on each of my NVME drives:

root@saturn:~# lspci -nnk -d 8086:0a54
10000:01:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54]
Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510) [8086:4802]
Kernel driver in use: vfio-pci
Kernel modules: nvme
10000:02:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54]
Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510) [8086:4802]
Kernel driver in use: vfio-pci
Kernel modules: nvme
10002:01:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54]
Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510) [8086:4802]
Kernel driver in use: vfio-pci
Kernel modules: nvme
10002:02:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54]
Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510) [8086:4802]
Kernel driver in use: vfio-pci
Kernel modules: nvme

I even went ahead and changed the driver for the SATA controller:

root@saturn:~# lspci -nnk -d 8086:a182
0000:00:17.0 SATA controller [0106]: Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode] [8086:a182] (rev 09)
Subsystem: Dell C620 Series Chipset Family SATA Controller [AHCI mode] [1028:073a]
Kernel driver in use: vfio-pci
Kernel modules: ahci



Bonus fries - everything disappeared from the OS:

root@saturn:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sde 8:64 1 0B 0 disk
zd0 230:0 0 1M 0 disk
nvme1n1 259:3 0 1.8T 0 disk
├─nvme1n1p1 259:4 0 1007K 0 part
├─nvme1n1p2 259:5 0 1G 0 part
└─nvme1n1p3 259:6 0 1.8T 0 part
nvme3n1 259:10 0 1.8T 0 disk
├─nvme3n1p1 259:12 0 1007K 0 part
├─nvme3n1p2 259:13 0 1G 0 part
└─nvme3n1p3 259:14 0 1.8T 0 part


Sadly - on boot up of my VM, two u.2 NVME's still disappear, and we fail to boot.

So close, yet so far away.
 

Attachments

Last edited: