Hi to everyone,
I have a fresh install of proxmox VE 7.1 (uefi) and some VM I migrated from an ESXi 6 server.
One of them uses a Nvidia Quadro P600 GPU for video encoding, so I needed a passthrough.
I enabled Vt-d in Bios (is an HP Elitedesk 800 G5 SFF with an i5-9500)
I enable Iommu:
/etc/default/grub
followed by
update-grub and for good measure pve-efiboot-tool refresh
so dmesg | grep -e DMAR -e IOMMU -e AMD-Vi:
and find /sys/kernel/iommu_groups/ -type l works correctly:
I added the correct modules on /etc/modules:
and configured vfio:
followed by update-initramfs -u -k all
lspci -nnk shows that the GPU uses correctly vfio-pci as kernel driver, BUT also loading nvidiafb/nouveau kernel modules
that sounds strange to me, but I cannot understand if is ok.
Moving on to the VM;
this is the .conf file:
and in facts seems to work correctly:
lspci -nnk (inside VM)
nvidia-detect
BUT nvidia-smi
and those three blocks during boot
what I'm missing?
Sorry for the long post but I wanted to include all the relevant informations
I have a fresh install of proxmox VE 7.1 (uefi) and some VM I migrated from an ESXi 6 server.
One of them uses a Nvidia Quadro P600 GPU for video encoding, so I needed a passthrough.
I enabled Vt-d in Bios (is an HP Elitedesk 800 G5 SFF with an i5-9500)
I enable Iommu:
/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_aspm=off pci=noaer"
update-grub and for good measure pve-efiboot-tool refresh
so dmesg | grep -e DMAR -e IOMMU -e AMD-Vi:
Code:
[ 0.008592] ACPI: DMAR 0x00000000A3C0D000 0000C8 (v01 INTEL CFL 00000002 01000013)
[ 0.008622] ACPI: Reserving DMAR table memory at [mem 0xa3c0d000-0xa3c0d0c7]
[ 0.026542] DMAR: IOMMU enabled
[ 0.068632] DMAR: Host address width 39
[ 0.068633] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.068638] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.068640] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.068643] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.068645] DMAR: RMRR base: 0x000000a381d000 end: 0x000000a383cfff
[ 0.068646] DMAR: RMRR base: 0x000000a8000000 end: 0x000000ac7fffff
[ 0.068647] DMAR: RMRR base: 0x000000a386e000 end: 0x000000a38edfff
[ 0.068649] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.068650] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.068651] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.071854] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.287481] DMAR: No ATSR found
[ 0.287481] DMAR: No SATC found
[ 0.287483] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.287484] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.287484] DMAR: IOMMU feature nwfs inconsistent
[ 0.287485] DMAR: IOMMU feature pasid inconsistent
[ 0.287485] DMAR: IOMMU feature eafs inconsistent
[ 0.287486] DMAR: IOMMU feature prs inconsistent
[ 0.287486] DMAR: IOMMU feature nest inconsistent
[ 0.287487] DMAR: IOMMU feature mts inconsistent
[ 0.287487] DMAR: IOMMU feature sc_support inconsistent
[ 0.287488] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.287489] DMAR: dmar0: Using Queued invalidation
[ 0.287491] DMAR: dmar1: Using Queued invalidation
[ 0.287889] DMAR: Intel(R) Virtualization Technology for Directed I/O
and find /sys/kernel/iommu_groups/ -type l works correctly:
Code:
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
I added the correct modules on /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
and configured vfio:
Code:
options vfio-pci ids=10de:1cb2,10de:0fb9
followed by update-initramfs -u -k all
lspci -nnk shows that the GPU uses correctly vfio-pci as kernel driver, BUT also loading nvidiafb/nouveau kernel modules
Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P600] [10de:1cb2] (rev a1)
Subsystem: Hewlett-Packard Company GP107GL [Quadro P600] [103c:11bd]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
Subsystem: Hewlett-Packard Company GP107GL High Definition Audio Controller [103c:11bd]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
Moving on to the VM;
this is the .conf file:
Code:
boot: order=scsi0
cores: 6
hostpci0: 0000:01:00,pcie=1
machine: q35
memory: 4096
name: Emby-Server
net0: vmxnet3=00:0c:29:3c:ee:da,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
scsi0: local-lvm:vm-203-disk-0
smbios1: uuid=c0518828-c438-488b-bce2-5c029108f54c
sockets: 1
vmgenid: fbea01a0-8e04-4ed4-8ff3-458217639eef
and in facts seems to work correctly:
lspci -nnk (inside VM)
Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P600] [10de:1cb2] (rev a1)
Subsystem: Hewlett-Packard Company GP107GL [Quadro P600] [103c:11bd]
Kernel driver in use: nvidia
Kernel modules: nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
Subsystem: Hewlett-Packard Company GP107GL High Definition Audio Controller [103c:11bd]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
Code:
Detected NVIDIA GPUs:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P600] [10de:1cb2] (rev a1)
Checking card: NVIDIA Corporation GP107GL [Quadro P600] (rev a1)
Your card is supported by all driver versions.
Your card is also supported by the Tesla 460 drivers series.
Your card is also supported by the Tesla 450 drivers series.
Your card is also supported by the Tesla 418 drivers series.
It is recommended to install the
nvidia-driver
package.
Code:
No devices were found
Code:
[ 0.970999] PCI Interrupt Link [GSIF] enabled at IRQ 21
[ 0.972495] shpchp 0000:05:01.0: pci_hp_register failed with error -16
[ 0.972730] shpchp 0000:05:01.0: Slot initialization failed
[ 0.974412] shpchp 0000:05:02.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.974566] PCI Interrupt Link [GSIG] enabled at IRQ 22
[ 0.975599] shpchp 0000:05:02.0: pci_hp_register failed with error -16
[ 0.975754] shpchp 0000:05:02.0: Slot initialization failed
[ 0.976773] shpchp 0000:05:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.976923] PCI Interrupt Link [GSIH] enabled at IRQ 23
[ 0.978003] shpchp 0000:05:03.0: pci_hp_register failed with error -16
[ 0.978156] shpchp 0000:05:03.0: Slot initialization failed
[ 0.979098] shpchp 0000:05:04.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.979247] PCI Interrupt Link [GSIE] enabled at IRQ 20
[ 0.980364] shpchp 0000:05:04.0: pci_hp_register failed with error -16
[ 0.980516] shpchp 0000:05:04.0: Slot initialization failed
Code:
[ 5.513278] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 5.513634] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 5.651159] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 5.651383] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Code:
[ 11.158594] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 11.158735] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 11.298546] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 11.298725] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 567.809457] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 567.809587] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 567.949916] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 567.950115] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
what I'm missing?
Sorry for the long post but I wanted to include all the relevant informations