GPU passthrough HP Z840 not working

xvlvx

New Member
Jan 10, 2022
16
0
1
45
Hi All,

And yes, jet another GPU passthrough thread, why? There are to many, with too many sources of information and some of them are seriously outdated, which leads to issues,...ask me how I know!!!

So, the give a little background information. I'm running proxmox (happily) for years now, I started to use somewhere late V5 early V6 and always used it to host all of my (mostly) linux vm's and containers. I recently upped my game by switching to new (better) hardware, in my case the HP Z840 workstation.

So far, so good! All VM's and CT's are migrated and working, some already expanded their configs to adjust to the new real-estate.

But now my issues. The change to this new proxmox server is to start using the GPU passthrough options, I would like to run a couple of my VM's (now hosted on my iMac in VMware fusion) on proxmox. These are mostly GPU hungry VM's and run 'slightly too laggy' to really use them.

I bought this Z840 with a Nvidia NVS310 and a GTX 285 card, both I wish to passthrough to a VM.

I started my modding of the conf files with usuals:
- /etc/default/grub to
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
- adding the VFIO drivers to /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
- blacklisting the graphics drivers in /etc/modules.d/:
Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
- running update-grub and update-initramfs -u

After rebooting and running the command:
Code:
dmesg | grep -e DMAR -e IOMMU
it reflects that iommu is working:
Code:
[    0.407883] DMAR: IOMMU enabled"

So far so good.

Then I added the NVS310 to a win10 vm with the below config:
Code:
bios: ovmf
boot: order=sata0;ide2;net0
cores: 1
efidisk0: local-lvm:vm-105-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:04:00,pcie=1,x-vga=1
ide2: local:iso/Windows10-x64-new.iso,media=cdrom,size=4141440K
machine: pc-q35-7.0
memory: 2048
meta: creation-qemu=7.0.0,ctime=1667148331
name: Windows-10-105
net0: e1000=7A:A1:D1:C3:E0:D7,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata0: local-lvm:vm-105-disk-1,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=5b3df202-bb2b-4450-9834-60d297ba7ee6
sockets: 1
vga: none
vmgenid: 137a9eaa-953d-412b-99c9-6ae7aa20df3e

When I started the VM all hell broke loose.
I wasn't well prepared and couldn't capture the errors, which lead to attempt #2, with an open shh terminal I could follow the journal, which blew-up full of erros (125MB of ssh capture), until I could hard kill the vm.
The capture had thousands of lines with:
Code:
Nov 01 16:18:43 proxmox-z840 QEMU[4840]: kvm: vfio_region_write(0000:04:00.0:region1+0x13f8, 0x0,8) failed: Device or resource busy
Nov 01 16:18:43 proxmox-z840 kernel: vfio-pci 0000:04:00.0: BAR 1: can't reserve [mem 0xd0000000-0xd7ffffff 64bit pref]

Proxmox needed to be hard rebooted by a power cycle.

From this point on, my quest to get this working resulted in several hours of googling, youtube videos on the topic (where it always works) and plenty of threads on this forum. But no cigar!


Just to show some of my work, here are all grub lines I tested:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# Disable the 'efifb' graphics driver using efifb:off. This will prevent the driver from stealing the GPU. An unfortunate side-effect of this is that you will not be able to see what your computer is doing while it is booting up.
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off video=efifb:off"

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream video=efifb:eek:ff video=vesafb:eek:ff"

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:eek:ff video=vesafb:eek:ff"

Here i discovered that many options aren't well documented, some of them aren't needed by some and others swear by them. At the moment it leaves my very confused. If I run the command below for instance, it shows that 'iommu grouping 'is done correctly by my system, which yields some of the GRUB_CMDLINE_LINUX_DEFAULT options already overkill/useless.
Code:
dmesg | grep 'remapping'
Code:
[    0.912231] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.913102] DMAR-IR: Enabled IRQ remapping in x2apic mode

I tried a lot of things, not all shown above. If needed I can add more if things aren't clear enough, but for now. Is anyone capable of telling why GPU passthrough is so troublesome for me?

Best regards,
LVX
 
I tried a lot of things, not all shown above. If needed I can add more if things aren't clear enough, but for now. Is anyone capable of telling why GPU passthrough is so troublesome for me?
The "BAR can't reserve" error is common if the same GPU is used during the boot of the Proxmox host. If you are using an up-to-date Proxmox 7.2 with kernel version 5.15 you need an entirely different work-around to make your system boot head-less: use initcall_blacklist=sysfb_init instead of all those outdated video='s.
EDIT: You might want to early bind the various devices to you want to passthrough to vfio-pci to prevent any host drivers from touching the devices.
 
Last edited:
Hi leesteken,

Thanks for your reply, I kinda already discovered that it had something to do with the GPU already being used by the system. I just couldn't get my head around it to untangle the inner workings of this.

About your questions:
If you are using an up-to-date Proxmox 7.2 with kernel version 5.15
Yep, fully up-to-date:
Code:
Linux proxmox-z840 5.15.64-1-pve #1 SMP PVE 5.15.64-1 (Thu, 13 Oct 2022 10:30:34 +0200) x86_64 GNU/Linux

And adding the suggested workaround, I cleaned my grub-cmd-line to:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"

Than, for stopping linux to take controll of the hardware. I added the following line to '/etc/modprobe.d/vfio.conf, after getting the ids with 'lspci -nn':
Code:
options vfio-pci ids=10de:107d,10de:0e08,10de:05e3

After that I recreated the initram and grub files with:
Code:
update-initramfs -u -k all
update-grub

And finally rebooted the machine. After rebooting the devices show up as 'vfio-pci':
Code:
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)
    Subsystem: Hewlett-Packard Company GF119 [NVS 310] [103c:094e]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
04:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1)
    Subsystem: Hewlett-Packard Company GF119 HDMI Audio Controller [103c:094e]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT200b [GeForce GTX 285] [10de:05e3] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GT200b [GeForce GTX 285] [1043:82e8]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

After starting the VM the journal showed the following line, which hoepfully makes this all happen:
Code:
Nov 04 07:49:30 proxmox-z840 kernel: vfio-pci 0000:04:00.1: enabling device (0100 -> 0102)

I booted the Win10 VM, at first the GPU was not shown, but somehow after a reboot (or two) it suddenly did show up and I could install the NVIDIA drivers.
From that point on the VM won't boot anymore. As I experimented a lot, I'm in the process of creating a new VM to further test.

BR,
LVX
 
I have no experiences with NVidia GPUs. Sounds like your passthrough is working in principle (VM has the device without crashing the host). Hopefully other people can help you with WIndows and NVidia drivers.
 
I booted the Win10 VM, at first the GPU was not shown, but somehow after a reboot (or two) it suddenly did show up and I could install the NVIDIA drivers.
From that point on the VM won't boot anymore. As I experimented a lot, I'm in the process of creating a new VM to further test.
what exactly does 'won't boot anymore' mean? no output?
both gpus are rather old and cannot work with the newest drivers (e.g. the nvs 310 needs at max the 327 driver AFAICS)

can you post the whole journal of a boot when that happens?
 
Hi dcsapak,

what exactly does 'won't boot anymore' mean? no output?
At first I couldn't reconnect to a RDP session. Which is then quite difficult to debug, So I enabled the Spice display settings again. By then windows booted into a crashed environment. Noting to salvage anymore.

(e.g. the nvs 310 needs at max the 327 driver AFAICS)
That is rather shameful then, because I installed the 392.68 driver just now, and ended up with a 'Code 43' (again!). Okay lets rebuild the VM once more then :)

Quick question about the GPU cards, both cards where part of the deal when I bought this PC, so what would you recommend as a best use (nividia) GPU card for a proxmox passthrough device?

Br,
LVX
 
mmm, getting a driver older than the 375 isn't possible according to the nvidia website!
 
I've installed the latest driver from the nvidia website [375] and still a code 43 from windows device manager. Now after a reboot windows won't even boot anymore, see the log below.

This is the output from proxmox journal:
Code:
Nov 06 09:32:25 proxmox-z840 kernel: vfio-pci 0000:04:00.0: enabling device (0000 -> 0003)
Nov 06 09:32:28 proxmox-z840 pvedaemon[1194754]: VM 105 qmp command failed - VM 105 qmp command 'query-proxmox-support' failed - got timeout
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Nov 06 09:32:28 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Nov 06 09:32:28 proxmox-z840 kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
Nov 06 09:32:28 proxmox-z840 kernel: pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004000/00000000
Nov 06 09:32:28 proxmox-z840 kernel: pcieport 0000:00:03.0:    [14] CmpltTO                (First)
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: can't change power state from D3hot to D0 (config space inaccessible)
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: can't change power state from D3cold to D0 (config space inaccessible)
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: vfio_cap_init: hiding cap 0xff@0xff
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: vfio_cap_init: hiding cap 0xff@0xff
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: vfio_cap_init: hiding cap 0xff@0xff

the last lines repeat +/- 50 times

Code:
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: vfio_cap_init: hiding cap 0xff@0xff
Nov 06 09:32:28 proxmox-z840 kernel: vfio-pci 0000:04:00.1: vfio_cap_init: hiding cap 0xff@0xff
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Root Port link has been reset (0)
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: device recovery successful
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004000/00000000
Nov 06 09:32:29 proxmox-z840 kernel: pcieport 0000:00:03.0:    [14] CmpltTO                (First)
Nov 06 09:32:30 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Root Port link has been reset (0)
Nov 06 09:32:30 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: device recovery successful
Nov 06 09:32:31 proxmox-z840 pvestatd[2814]: VM 105 qmp command failed - VM 105 not running
Nov 06 09:32:35 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Nov 06 09:32:35 proxmox-z840 kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
Nov 06 09:32:35 proxmox-z840 kernel: pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004000/00000000
Nov 06 09:32:35 proxmox-z840 kernel: pcieport 0000:00:03.0:    [14] CmpltTO                (First)
Nov 06 09:32:35 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 06 09:32:35 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 06 09:32:35 proxmox-z840 pvestatd[2814]: status update time (6.573 seconds)
Nov 06 09:32:35 proxmox-z840 systemd[1]: 105.scope: Succeeded.
Nov 06 09:32:35 proxmox-z840 systemd[1]: 105.scope: Consumed 12.797s CPU time.
Nov 06 09:32:35 proxmox-z840 pvedaemon[1194798]: start failed: QEMU exited with code 1
Nov 06 09:32:35 proxmox-z840 pvedaemon[352781]: <root@pam> end task UPID:proxmox-z840:00123B2E:0109B9E5:63677112:qmstart:105:root@pam: start failed: QEMU exited with code 1
Nov 06 09:32:36 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: Root Port link has been reset (0)
Nov 06 09:32:36 proxmox-z840 kernel: pcieport 0000:00:03.0: AER: device recovery successful

Is this 'just' related to the driver/card age?
 
Last edited:
honestly i don't know, i guess these errors can have a variety of reasons, from my short search i found: faulty hardware, modded bios, etc...

also:

mmm, getting a driver older than the 375 isn't possible according to the nvidia website!
ah sorry, i misread the page, it presented the '392.68' driver to me too, but looked at the filesize when looking it up ^^ (which is 327 MiB)...
 
HI All,

I upped my Graphics Card to a GeForce GTX1070ti.
I left the NVS-310 also in, and set it to the primary GPU, that means I use one GPU card for my proxmox environment and one for the hosts.

I removed the framebuffer stuff from the grub-cmd-line to:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

And added the following line to '/etc/modprobe.d/vfio.conf, so just the 1070ti is bonded to the vfio driver:
Code:
# lspci -nn
# 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)
# 03:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1)
# 04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] [10de:1b>
# 04:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0>
#
# only the GeForce GTX 1070 Ti is blocked
options vfio-pci ids=10de:1b82,10de:10f0

After rebuilding the initram and grub conf a reboot showed that the NVS-310 is free to be used and the 1070ti is linked to the vfio driver:
Code:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)
    Subsystem: Hewlett-Packard Company GF119 [NVS 310] [103c:094e]
    Kernel modules: nvidiafb, nouveau
03:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1)
    Subsystem: Hewlett-Packard Company GF119 HDMI Audio Controller [103c:094e]
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] [10de:1b82] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1070 Ti] [1043:861e]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
04:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GP104 High Definition Audio Controller [1043:861e]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

The tricky part, getting the windows 10 vm to accept and use the GPU is whole other business, see below my VM.conf. What is obviously wrong there? I used both with and without the 'romfile' parameter, and both the rom's directly from the website and the one patched acc. to github: "nvidia_vbios_vfio_patcher.py"
Code:
bios: ovmf
boot: order=ide2;ide0;net0
cores: 2
cpu: host,hidden=1,flags=+pcid
efidisk0: localstorage1:vm-105-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:04:00,pcie=1,romfile=/usr/share/kvm/Asus.GTX1070Ti.8192.171011-patched.rom,x-vga=1
ide0: localstorage1:vm-105-disk-1,size=50G
ide2: none,media=cdrom
machine: pc-q35-7.0
memory: 8196
meta: creation-qemu=7.0.0,ctime=1667736623
name: Win10-105
net0: e1000=72:CD:28:4B:78:CD,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=30893e11-284f-4da7-a7d8-97b44616e237
sockets: 2
vmgenid: 09eaf41e-5959-4cc5-924d-9c8d03722003


This is the versions that does work (without the GPU):
Code:
bios: ovmf
boot: order=ide2;ide0;net0
cores: 2
cpu: host
efidisk0: localstorage1:vm-105-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide0: localstorage1:vm-105-disk-1,size=50G
ide2: none,media=cdrom
machine: pc-q35-7.0
memory: 8196
meta: creation-qemu=7.0.0,ctime=1667736623
name: Win10-105
net0: e1000=72:CD:28:4B:78:CD,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=30893e11-284f-4da7-a7d8-97b44616e237
sockets: 2
vmgenid: 09eaf41e-5959-4cc5-924d-9c8d03722003

If anyone has some thoughts then please HELP!
 
By the way - I've run a Debian host VM with Nvidia drivers which works fine. The 3D test from nvidia (glxgears) works perfect.
So it definitely is a Win 10 issue.

The win10 installation works fine, with and without PCI passthrough, but as soon as the nvidia drivers are installed (both from windows update as from nvidia website) the VM cannot boot anymore and holds directly after the UEFI BIOS screen.

What am I missing!?
 
Sorry for bumping this thread!

I'm just utterly confused on why Win10 won't boot after I install the drivers and reboot.

Can anyone please explain what is causing my VM guest to just stop from booting.
This is really hard to debug if you cannot look at a screen.

BTW the proxmox journal isn't showing any issues:

Best regards,
LVX
 
did you try without having secure boot enabled (e.g. unchecking the 'pre enrolled keys' in the efidisk creation or via the ovmf menu (press esc on vm start)) ?
it's just a shot in the dark though, otherwise i have no real idae

you could send the host logs (journal/dmesg) maybe somethings in there that gives a hint
 
Hi dcsapak,

I didn't know you could do that with the BIOS. I've checked and the Q35 BIOS has got secure boot activated, so I disabled it without any difference, maybe a slight different in the boot loop now, as it automatically resets, which it didn't do before.

Below the output of the VM from start to 'kill' (without secureboot BIOS edits)

DMESG:
Code:
[246846.336163] device tap105i0 entered promiscuous mode
[246846.375158] vmbr0: port 6(fwpr105p0) entered blocking state
[246846.375164] vmbr0: port 6(fwpr105p0) entered disabled state
[246846.375259] device fwpr105p0 entered promiscuous mode
[246846.375290] vmbr0: port 6(fwpr105p0) entered blocking state
[246846.375292] vmbr0: port 6(fwpr105p0) entered forwarding state
[246846.383081] fwbr105i0: port 1(fwln105i0) entered blocking state
[246846.383085] fwbr105i0: port 1(fwln105i0) entered disabled state
[246846.383164] device fwln105i0 entered promiscuous mode
[246846.383234] fwbr105i0: port 1(fwln105i0) entered blocking state
[246846.383237] fwbr105i0: port 1(fwln105i0) entered forwarding state
[246846.390158] fwbr105i0: port 2(tap105i0) entered blocking state
[246846.390161] fwbr105i0: port 2(tap105i0) entered disabled state
[246846.390263] fwbr105i0: port 2(tap105i0) entered blocking state
[246846.390266] fwbr105i0: port 2(tap105i0) entered forwarding state
[246847.530121] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[246865.357682] usb 2-12: reset low-speed USB device number 2 using xhci_hcd
[246865.817771] usb 2-13.1: reset full-speed USB device number 4 using xhci_hcd

JOURNALCTL:
Code:
Nov 16 09:21:29 proxmox-z840 pvedaemon[3037759]: <root@pam> starting task UPID:proxmox-z840:000E1705:0178A790:63749D89:qmstart:105:root@pam:
Nov 16 09:21:29 proxmox-z840 pvedaemon[923397]: start VM 105: UPID:proxmox-z840:000E1705:0178A790:63749D89:qmstart:105:root@pam:
Nov 16 09:21:29 proxmox-z840 systemd[1]: Started 105.scope.
Nov 16 09:21:29 proxmox-z840 systemd-udevd[923411]: Using default interface naming scheme 'v247'.
Nov 16 09:21:29 proxmox-z840 systemd-udevd[923411]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 16 09:21:30 proxmox-z840 kernel: device tap105i0 entered promiscuous mode
Nov 16 09:21:30 proxmox-z840 systemd-udevd[923411]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 16 09:21:30 proxmox-z840 systemd-udevd[923410]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 16 09:21:30 proxmox-z840 systemd-udevd[923410]: Using default interface naming scheme 'v247'.
Nov 16 09:21:30 proxmox-z840 systemd-udevd[923411]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 16 09:21:30 proxmox-z840 kernel: vmbr0: port 6(fwpr105p0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: vmbr0: port 6(fwpr105p0) entered disabled state
Nov 16 09:21:30 proxmox-z840 kernel: device fwpr105p0 entered promiscuous mode
Nov 16 09:21:30 proxmox-z840 kernel: vmbr0: port 6(fwpr105p0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: vmbr0: port 6(fwpr105p0) entered forwarding state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 1(fwln105i0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 16 09:21:30 proxmox-z840 kernel: device fwln105i0 entered promiscuous mode
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 1(fwln105i0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 1(fwln105i0) entered forwarding state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered blocking state
Nov 16 09:21:30 proxmox-z840 kernel: fwbr105i0: port 2(tap105i0) entered forwarding state
Nov 16 09:21:31 proxmox-z840 kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Nov 16 09:21:32 proxmox-z840 pvedaemon[3037759]: <root@pam> end task UPID:proxmox-z840:000E1705:0178A790:63749D89:qmstart:105:root@pam: OK
Nov 16 09:21:32 proxmox-z840 pvedaemon[3037759]: <root@pam> starting task UPID:proxmox-z840:000E1784:0178A8EE:63749D8C:vncproxy:105:root@pam:
Nov 16 09:21:32 proxmox-z840 pvedaemon[923524]: starting vnc proxy UPID:proxmox-z840:000E1784:0178A8EE:63749D8C:vncproxy:105:root@pam:
Nov 16 09:21:49 proxmox-z840 kernel: usb 2-12: reset low-speed USB device number 2 using xhci_hcd
Nov 16 09:21:49 proxmox-z840 kernel: usb 2-13.1: reset full-speed USB device number 4 using xhci_hcd
Nov 16 09:23:13 proxmox-z840 pveproxy[3044]: worker 908748 finished
Nov 16 09:23:13 proxmox-z840 pveproxy[3044]: starting 1 worker(s)
Nov 16 09:23:13 proxmox-z840 pveproxy[3044]: worker 959014 started
Nov 16 09:23:18 proxmox-z840 pveproxy[958560]: got inotify poll request in wrong process - disabling inotify
Nov 16 09:24:59 proxmox-z840 pvedaemon[2840375]: worker exit
Nov 16 09:24:59 proxmox-z840 pvedaemon[3035]: worker 2840375 finished
Nov 16 09:24:59 proxmox-z840 pvedaemon[3035]: starting 1 worker(s)
Nov 16 09:24:59 proxmox-z840 pvedaemon[3035]: worker 1083445 started
Nov 16 09:25:51 proxmox-z840 pvedaemon[3037759]: <root@pam> end task UPID:proxmox-z840:000E1784:0178A8EE:63749D8C:vncproxy:105:root@pam: OK
Nov 16 09:25:51 proxmox-z840 pveproxy[958560]: worker exit
Nov 16 09:25:52 proxmox-z840 pvedaemon[1196205]: starting vnc proxy UPID:proxmox-z840:001240AD:01790E43:63749E90:vncproxy:105:root@pam:
Nov 16 09:25:52 proxmox-z840 pvedaemon[3037759]: <root@pam> starting task UPID:proxmox-z840:001240AD:01790E43:63749E90:vncproxy:105:root@pam:

As you can see not really something of interest in the logs.

Just to make sure we understand that proxmox is setup correctly and tested. I ran a Debian VM with this same card in passthrough, just via the GUI no additional .conf edits and it runs great, full 4K remote desktop with 3D acceleration, tested with the glxgears program.

The setup of the Win10 VM is:
- Configuration of the VM, with these additions in the .conf file:
- args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
- cpu: host,hidden=1,flags=+pcid
- hostpci0: 0000:04:00,pcie=1,romfile=/usr/share/kvm/Asus.GTX1070Ti.8192.171011-patched.rom,x-vga=1
- Installation of windows 10
- reboot and finish
- VM shows in the device manager the 2 video devices, default and nvidia (not recognized jet)
- enable remote desktop, reboot and verifiy of remote desktop connect
- installation of driver
- freeze
- after killing the vm it only ever boot loops at/near the BIOS and hangs, windows doesn't boot anymore


With some googling and duckducking I've found multiple threads about this vary same topic.
So, my main question is: what is causing this issue:
- the GPU
- the NVIDIA driver blocking the passthrough, as it was never intended for this use?
- or something within Windows 10 ( and 11 by the way to)
- or something in proxmox (most likely not as it works for the debian-vm)


Here are some of the links I found, hopefully someone find this thread and can tie it all togehter:
https://forums.unraid.net/topic/129294-passthrough-issue-with-gtx1070-on-window-10-vm/
https://forums.unraid.net/topic/129...with-gtx1070-on-window-10-vm/#comment-1178389
 
- args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
this is really not necessary, especially when using a modern nvidia driver (it may even be harmful, since that line overwrites some of our generated cpu configs for the vm)

- VM shows in the device manager the 2 video devices, default and nvidia (not recognized jet)
- enable remote desktop, reboot and verifiy of remote desktop connect
- installation of driver
personally i'd install the driver before messing with remote desktop

aside from that it looks like it should work though
what you could also do is to wait until it fails, remove the passthrough again, start it up and check the windows event viewer
 
Hi dcsapak,

Thank you for the reply.

I'm trying out your ideas and tweaks right now, I will keep you posted about the progress.

Just a quick question about the driver. In the original NVIDIA article from back in 2021 it is mentioned that driver R465 will allow for this passthrough. When I lookup the driver for the GTX1070ti it never ends up at those R-drivers.

From my understanding the R-drivers are for quadro and rtx graphic cards.

Which driver do you use for this purpose?
 
AFAIU all drivers since 465 should support that.
i don't have tested consumer nvidia cards for quite a while, but from what i read from various sources (forum,reddit,etc.) this should simply work (if all other components playing nice)
 
Hi dcsapak,

I gave up on the consumer graphics cards.

I just got a quadro card, a bam, all is working straight out of the box (off-course I changed the ID's in /etc/modprobe.d/vfio.conf).
I 'just' added the card via the GUI to the VM, hit all 4 checkboxes, no tweaking in the vm.conf file, and installed Win10 without any issues. Added the NVIDIA driver and I'm using the VM happily now.

Man I wish I didn't go down the rabbit hole if those Geforce cards.

Best of luck to all, and many thanks for your support.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!