GPU Passthrough with Proxmox 8.2 not working

thphon

New Member
Sep 22, 2024
1
0
1
Hi all,

first post here, I'm new to Proxmox, I've been playing with it for the past couple of weeks to set up a homelab. I was able to install TrueNAS Scale and passthrough 2 HDDs to create a pool and I was ble to migrate that pool to a different TrueNAS Scale VM. Now I'm trying to setup an Ubuntu server 24.04 and passthrough the GPU to use with PhotoPrism and Jellyfin, but VM won't start with the GPU added to the configuration. I've checked dozens of posts about the issue I'm having and I was able to solve many errors but now I'm stuck, maybe it's related to 8.2 and its kernel but I'm not sure how to work around that.
Now journalctl is showing the following error:

Code:
Sep 23 08:53:08 pve qm[2937]: VM 101 qmp command failed - VM 101 qmp command 'set_password' failed - Could not set password
Sep 23 08:53:08 pve pvedaemon[2935]: Failed to run vncproxy.
Sep 23 08:53:08 pve pvedaemon[1532]: <root@pam> end task UPID:pve:00000B77:00009331:66F164B3:vncproxy:101:root@pam: Failed to run vncproxy.

My understanding is that this failed connection to vncproxy is a consequence and not the real issue so I run out of things to troubleshoot. If I remove the GPU it boots just fine, that's why I'm leaning towards having an issue with the kernel since 8.2 went live as mentioned before.

My setup:
i5 9600k, 64gb RAM, MSI Z370-A PRO, ASUS 1070Ti, RAID1 ZFS (2x 1tb Crucial BX500).

My configs:

1. /etc/kernel/cmdline
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction initcall_blacklist=sysfb_init
Run proxmox-boot-tool refresh and reboot

2. /etc/modules
Code:
vfio
vfio_iommu_type1
vfio_pci

3. /etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=10de:1b82,10de:10f0 disable_vga=1

4. /etc/modprobe.d/blacklist.conf
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
blacklist nvidiafb

5. /etc/modprobe.d/kvm.conf
Code:
options kvm ignore_msrs=1 report_ignored_msrs=0

6. /etc/modprobe.d/iommu_unsafe_interrupts.conf
Code:
options vfio_iommu_type1 allow_unsafe_interrupts=1

7. dmesg | grep -e DMAR -e IOMMU
Code:
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.012451] ACPI: DMAR 0x0000000079AB2B10 0000A8 (v01 INTEL  EDK2     00000001 INTL 00000001)
[    0.012474] ACPI: Reserving DMAR table memory at [mem 0x79ab2b10-0x79ab2bb7]
[    0.088360] DMAR: IOMMU enabled
[    0.225199] DMAR: Host address width 39
[    0.225200] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.225209] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.225213] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.225217] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.225220] DMAR: RMRR base: 0x0000007aaba000 end: 0x0000007ad03fff
[    0.225224] DMAR: RMRR base: 0x0000007b800000 end: 0x0000007fbfffff
[    0.225226] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.225229] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.225231] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.226765] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.451881] DMAR: No ATSR found
[    0.451884] DMAR: No SATC found
[    0.451886] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.451887] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.451890] DMAR: IOMMU feature nwfs inconsistent
[    0.451892] DMAR: IOMMU feature pasid inconsistent
[    0.451894] DMAR: IOMMU feature eafs inconsistent
[    0.451897] DMAR: IOMMU feature prs inconsistent
[    0.451899] DMAR: IOMMU feature nest inconsistent
[    0.451901] DMAR: IOMMU feature mts inconsistent
[    0.451903] DMAR: IOMMU feature sc_support inconsistent
[    0.451905] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.451908] DMAR: dmar0: Using Queued invalidation
[    0.451916] DMAR: dmar1: Using Queued invalidation
[    0.452514] DMAR: Intel(R) Virtualization Technology for Directed I/O

8. lspci
Code:
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 0a)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 0a)
00:02.0 Display controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:14.2 Signal processing controller: Intel Corporation 200 Series PCH Thermal Subsystem
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.3 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #4 (rev f0)
00:1c.6 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #7 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Z370 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
04:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4360 802.11ac Wireless Network Adapter (rev 03)
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983

9. /etc/pve/qemu-server/101.conf (ubuntu server 24.04)
Code:
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 5
cpu: host
cpulimit: 5
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1,x-vga=1,romfile=1070ti.rom
machine: q35
memory: 49152
meta: creation-qemu=9.0.2,ctime=1727056864
name: ubuntu
net0: virtio=BC:24:11:61:50:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-zfs:vm-101-disk-1,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=b5cd51bc-1397-4de4-b861-542b2dd613b2
sockets: 1
tpmstate0: local-zfs:vm-101-disk-2,size=4M,version=v2.0
vmgenid: 91d73976-6cb8-435d-8a93-6249e1d5a99e

Note romfile=1070ti.comadded to hostpci0, that's one of the last things I added because I was seeing an error on journalctl related to ROM, with this it went away. I downloaded the ROM from https://www.techpowerup.com/vgabios/.
I rebooted many times after changes but I can't recall the exact order to be honest.

Something else worth mentioning, when I run update-initramfs -u -k all I get the following:
Code:
update-initramfs: Generating /boot/initrd.img-6.8.12-2-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/D4C9-19A9
        Copying kernel and creating boot-entry for 6.8.12-2-pve
        Copying kernel and creating boot-entry for 6.8.4-2-pve
Copying and configuring kernels on /dev/disk/by-uuid/D4C9-9AC6
        Copying kernel and creating boot-entry for 6.8.12-2-pve
        Copying kernel and creating boot-entry for 6.8.4-2-pve
update-initramfs: Generating /boot/initrd.img-6.8.4-2-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/D4C9-19A9
        Copying kernel and creating boot-entry for 6.8.12-2-pve
        Copying kernel and creating boot-entry for 6.8.4-2-pve
Copying and configuring kernels on /dev/disk/by-uuid/D4C9-9AC6
        Copying kernel and creating boot-entry for 6.8.12-2-pve
        Copying kernel and creating boot-entry for 6.8.4-2-pve
I'm just suspicious that there are 2 kernel entries although I checked and running uname - r shows 6.8.12-2-pve

I appreciate any help you can give me, and if you need any other configs or logs please let me know, I tried to cover everything I changed but I wouldn't be surprised if I'm missing something.

Thanks,
Leo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!