Passthrough error migrating from proxmox 8 to 9

hal9008

Member
Sep 23, 2020
13
0
21
48
I’ve run into a serious problem migrating from Proxmox 8 to Proxmox 9.

My machine specs:
  • Case: Antec P101 Silent
  • PSU: Seasonic B12 BM-650 - A651BMAFH - 80 Plus Bronze
  • Motherboard: Gigabyte GA-Z97X-UD3H (Proxmox reports it as Z97X-UD3H-CF). Released Q2 2014
  • CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz Fill By OEM CPU @ 3.4GHz – 4 Cores, 8 Threads – Released Q2’13
  • RAM:
    • 2x Kingston 99U5471-066.A00LF 8GiB DIMM DDR3 Synchronous 1600 MHz (0.6 ns) in slots 1 and 3
    • 2x KVR16N11/8 8GB PC RAM Kingston PC3-12800U DDR3-1600MHz in slots 2 and 4
    • Total: 32 GB
  • Dedicated GPU: Nvidia Quadro P2000 5 GB RAM – Passthrough to VM 101
  • Integrated GPU: Xeon E3-1200 v3 Processor Integrated Graphics Controller – Passthrough to VM 103
  • Onboard NIC: Intel Corporation Ethernet Connection I217-V
  • CPU Audio Controller (HDMI out): Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
  • Motherboard Audio Controller (case jacks): Intel 9 Series Chipset Family HD Audio Controller
  • Disk /dev/sda (1 TB, WDC WDS100T1R0A): Proxmox + multiple Linux OS (96% wearout as of 2025-04-01, 4% used)
  • Disk /dev/sdb (1 TB, WDC WDS100T1R0A): Virtualmin + Windows Server (92% wearout as of 2025-04-01, 8% used)
  • Disk /dev/sdc (7.45 TB, WDC WD8003FFBX-6): various – passthrough to VM 101
  • Disk /dev/sdd (3 TB, WDC WD30EZRX-00M): Proxmox backups + Apple Time Machine
I have three active VMs (101, 102, 103). VM 103 has passthrough of the integrated GPU and works fine.

The problem is with VM 101. I’m trying to passthrough a Quadro P2000, but it’s not working at all. I’ve tried everything. In short:
  • Switched the VM from BIOS to EFI
  • Updated Nvidia drivers inside the VM from 535 to 550
My /etc/default/grub on Proxmox looks like this:

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off intel_iommu=on iommu=pt initcall_blacklist=sysfb_init video=simplefb:off video=vesafb:off video=efifb:off video=vesa:off disable_vga=1 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=nouveau,nvidia,nvidiafb,nvidia-gpu,vesafb,efifb pcie_acs_override=downstream,multifunction pci=noaer"
GRUB_CMDLINE_LINUX=""

And my VM 101 config (101.conf):
Code:
agent: 1
args: -cpu host,kvm=off
bios: ovmf
boot: order=scsi3;scsi0
cores: 3
cpu: host,flags=+aes,hidden=1,hv_vendor_id=proxmox
#cpu: host
efidisk0: discoprincipal:101/vm-101-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:01:00.0;0000:01:00.1,pcie=1
machine: pc-q35-9.2
memory: 16384
name: Pi-Hole
net0: virtio=9E:3E:E3:24:90:84,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: discoprincipal:101/vm-101-disk-0.qcow2,discard=on,size=256G
scsi1: TimeMachine:101/vm-101-disk-0.qcow2,backup=0,discard=on,size=1430G
scsi2: /dev/disk/by-id/ata-WDC_WD8003FFBX-68B9AN0_VYHGRLXM,backup=0,discard=on,size=7814026584K
scsi3: discoprincipal:101/vm-101-disk-2.qcow2,discard=on,size=536871K,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=482d392e-32cd-49e5-ab8a-69f3541c194e
sockets: 1
tablet: 1
vmgenid: a1be1ed5-c701-4ddd-a383-cd0d72bcd560

I’ve tried every possible way to hide the GPU from Nvidia’s virtualization detection (even dumping and passing the GPU’s ROM), but nothing works. Inside the VM, I always end up with this:

Code:
marcosms@pi-hole:~/descargas/nvidia-patch-master$ nvidia-smi
No devices were found
marcosms@pi-hole:~/descargas/nvidia-patch-master$ sudo dmesg | grep -i nvidia
[    3.743446] nvidia: loading out-of-tree module taints kernel.
[    3.743455] nvidia: module license 'NVIDIA' taints kernel.
[    3.743459] nvidia: module license taints kernel.
[    3.870654] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[    3.872567] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    4.112386] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.163.01  Tue Apr  8 12:41:17 UTC 2025
[    4.149777] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.163.01  Tue Apr  8 12:09:34 UTC 2025
[    4.184175] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   20.611415] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[   20.615700] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[   21.226304] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[   21.323272] nvidia-uvm: Loaded the UVM driver, major device number 235.
marcosms@pi-hole:~/descargas/nvidia-patch-master$ lspci -nnk | grep -A 3 -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106GL [Quadro P2000] [10de:1c30] (rev a1)
        Subsystem: NVIDIA Corporation GP106GL [Quadro P2000] [10de:11b3]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
        Subsystem: NVIDIA Corporation GP106 High Definition Audio Controller [10de:11b3]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
05:01.0 PCI bridge [0604]: Red Hat, Inc. QEMU PCI-PCI bridge [1b36:0001]

I can’t get rid of these two errors:

Code:
[drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device

Which prevents the Nvidia drivers from loading correctly.


This didn’t happen on Proxmox 8. Is there any workaround for this? I refuse to believe I’m the only one facing this issue. Can someone open a ticket or escalate this so it gets fixed in Proxmox 9? It’s very frustrating that something works in the previous version but breaks in the new one.
 
I don’t understand what’s going on with passthrough functionality. Can someone explain it to me?

Is this now a premium feature or something like that? Does Nvidia have premium drivers? Why is this happening? I’ve seen several threads both here and elsewhere, and it looks like people are just shooting in the dark without knowing how to solve this issue.

Does what’s happening make any sense? Are the developers aware of these problems and working on solutions, or has this become a paid feature now?

Hopefully someone can clarify this.
 
I don’t understand what’s going on with passthrough functionality. Can someone explain it to me?

Is this now a premium feature or something like that? Does Nvidia have premium drivers? Why is this happening? I’ve seen several threads both here and elsewhere, and it looks like people are just shooting in the dark without knowing how to solve this issue.

Does what’s happening make any sense? Are the developers aware of these problems and working on solutions, or has this become a paid feature now?

Hopefully someone can clarify this.
Hi,

no, it's not a paid feature. Proxmox VE is 100% open source, and there is no feature gating.

As for why PCI pass-through does not work for you correctly, with the information i can't say exactly, but it's often a bit of trial and error (mostly because of hardware), especially with hardware that is older (you're using consumer grade mainboard, a cpu from 12 years ago and a gpu from 7 years ago).

In your case, i'd try to remove all customizations (kernel commandline, args line from the config, custom patches/packages/etc) and start fresh, posting the the vm config, the full 'dmesg' output of the host and guest, and 'lspci -nnk' output of the host and guest
 
First of all, thanks to @dcsapak for taking an interest in this issue. It reassures me to know that it’s not a licensing problem (I was convinced it had to be something along those lines).

These past few days I’ve been trying to solve this matter and came to a few conclusions:
  • Passthrough of both cards on my system is impossible. I tried countless modifications in the grub file and the files inside modprobe.d, but I couldn’t get passthrough working for the NVIDIA GPU. I tested different driver versions (535, 550, 570, 580), but none of them worked.
  • Since the GPU is only used by two specific dockers, I tried creating an LXC and running them there (Yes… I know LXC isn’t ideal for running docker, but it was worth a try). I installed the GPU drivers both on the host (Proxmox) and inside the LXC (obviously removing all the parameters from GRUB_CMDLINE_LINUX_DEFAULT first). That way, I managed to get the driver working inside the LXC and the card was recognized.
When running the dockers (Jellyfin and Ollama+OpenWebUI), they detected the GPU, but managed it poorly. The memory got saturated before actually using the containers and never freed up at any point (something that worked perfectly in Proxmox 8). On top of that, I couldn’t keep Intel GPU passthrough working while leaving the NVIDIA card as “no-passthrough” (if one worked, the other wouldn’t at all). Because of this, I chose the easy route: reinstall Proxmox 8 and restore backups of all machines. Now I finally have everything running again.

Still, I’m left with the frustration of not having been able to solve this issue. I understand the hardware I’m using is a bit dated, but believe me, for many tasks the system I’m running is still oversized (I’ve seen much newer setups running similar services at a fraction of the speed of this little server). Right now, with 3 virtual machines running (one Windows Server, another with Virtualmin hosting websites and email, and another with 15 dockers and multiple services of different kinds), I’m at 6% CPU usage and 70% RAM usage. With those conditions, changing hardware is simply not an option.

The fact that with the same EFI configuration and same hardware, passthrough doesn’t work at all in Proxmox 9 while in Proxmox 8 it runs smoothly can only point to “something-I-don’t-know” in version 9 that isn’t right. Especially since I see more people having passthrough issues in this version. My conclusion is that the hardware isn’t defective, nor is it “too old” for passthrough to be possible. I believe the problem comes from how the new software stack in Proxmox 9 (kernel + QEMU + drivers) interprets or handles that hardware, and that’s something the Proxmox developers should review.
 
Last edited:
i just want to point out that I'd like to help, but as I already said with the information at hand I cannot.

You did not provide any of the information I asked for (journalctl, dmesg, lspci, etc.) from a clean state, so it's impossible to say why it's working on pve 8 vs pve 9

It might be that there are some regressions in the kernel / qemu that are not reproducible with newer hardware, since it happens that older hardware is not that well tested anymore by kernel/qemu devs. (that's why i mentioned the age)

If you want to solve this for pve 9, please provide the output from the state i asked for (or at least test with e.g. the 6.14 kernel on pve8 & provide the output), otherwise it's impossible for me to help.
 
OK, sorry for not providing this in my previous message. I’m not entirely sure if what I’m attaching here is exactly what you need, but I hope it helps.

Right now the server is running Proxmox 8 with everything working perfectly (including passthrough). In the attached compressed file you’ll find:
  • The output of lspci, dmesg, and journalctl from both the Proxmox host and the VM that uses the NVIDIA card.
  • System events, device list, and general event logs from the Windows VM (which has the Intel GPU passed through).
It’s possible that in the VM event logs you’ll see plenty of errors between September 11 and 15 (those are the days when I tried the migration).

https://descargas.flopy.es/?r=/download&path=L1BhcmEgb3Ryb3MvRm9ybyBQcm94bW94L2xvZ3Muemlw

If you need any additional data, please just let me know.

If we can identify the root cause and have a clear path to a solution, I’d be very interested in migrating to Proxmox 9 (I prefer not to stay on older software versions if I can move to the newer ones).
 
Last edited:
well the most interesting things would be the host logs in a not working state, else i cannot even start to search where the problem might be.

As i said, a start would be to try the 6.14 kernel on the host, if that fails, we have a candidate to look where there might be a regression
 
The best approach I can think of to provide you with the required logs is the following:
  • Proxmox is installed on a single SSD (this is not an enterprise setup, so I’m not using RAID). I could use a disk cloner to copy the whole system drive to another SSD of the same size (I believe I have a spare one at home).
  • Once cloned, I can install that SSD in the server and check if it boots correctly. If it does, I would then upgrade the cloned system to Proxmox 9.
  • After the upgrade, I can test passthrough of the GPUs again, and if it fails, I’ll be able to collect the logs you need.
  • Once finished, I can simply swap back the original drive to quickly restore everything to its working state.
What I need to confirm is exactly which logs you’d like me to gather. If it’s the same set you mentioned before, no problem—I’ll prepare them.

In any case, I probably won’t have time to do this until this weekend or the next one. As soon as I have the data, I’ll post it here.

Thank you very much for the help and support you’re giving me.
 
Last edited:
What I need to confirm is exactly which logs you’d like me to gather. If it’s the same set you mentioned before, no problem—I’ll prepare them.
i'd like at least the output of

Code:
journalctl -b
dmesg
lspci -nnk

and the start task log

as well as any relevant logs from the guest (also journalctl, dmesg, lspci)

In any case, I probably won’t have time to do this until this weekend or the next one. As soon as I have the data, I’ll post it here.
no problem, just update the thread when you have the information