Nvidia Quadro 2000 GPU Passthrough Causes Kernel Panic

noisufnoc

Renowned Member
Aug 23, 2015
18
0
66
After some research, I picked up a Quadro 2000 for use as a passthrough GPU for a Windows 10 VM.

I rebuilt my VM, using the following guide from reddit (https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/), my entire system is kernel panicing when I try to boot the VM. Shortly after booting the windows VM, with the passthrough enabled, I can see the panic in the server's syslog. Eventually the entire Proxmox server goes unresponsive and requires a hardware reset. Any ideas?

lspci output
Code:
03:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)

After reqviewing this doc https://pve.proxmox.com/wiki/Pci_passthrough, I dumped the rom and used rom-parser to review:
Code:
Valid ROM signature found @0h, PCIR offset 188h
PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 0dd8, class: 030000
PCIR: revision 0, vendor revision: 1
Last image
According to the doc, you need to be type 3 to be UEFI compatible. I'm not sure if this is a deal breaker or if the reddit guide doesn't address older hardware?
 
can you post the complete vm config ( qm config ID ) and the iommu groups ? maybe also the dmesg output ? also does your log show anything when the host goes unresponsive?
 
can you post the complete vm config ( qm config ID ) and the iommu groups ? maybe also the dmesg output ? also does your log show anything when the host goes unresponsive?

Code:
root@splinter:~# qm config 110
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 4
cpu: host,hidden=1,flags=+pcid
efidisk0: sas15k:vm-110-disk-0,size=128K
ide0: iso:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: iso:iso/Win10_1903_V1_English_x64.iso,media=cdrom
machine: q35
memory: 6144
name: win10
net0: virtio=3A:AC:CE:AA:F7:C8,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-zfs:vm-110-disk-0,cache=writeback,iothread=1,replicate=0,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4d2208a9-e110-413d-9560-4d37b3b7b1ba
sockets: 1
vmgenid: 9e355091-d2f4-4f99-a749-f6aacfb9b7d0

how do i find the iommu groups? i do know there's some events in the syslog/dmseg when it goes unresponsive, i'll test it again soon.
 
afaics this config does not include any passed through card? (no hostpci setting)
also:
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

will overwrite this line:

cpu: host,hidden=1,flags=+pcid

why do you need to set it in args?
if for the vendor_id, you can either set this manually on the cpu line,e.g.

Code:
cpu: host,hidden=1,flags=+pcid,hv-vendor-id=FOO

or set 'x-vga=on' on the hostpci line (we detect a graphic card passthrough and set the vendor automatically to something)
 
afaics this config does not include any passed through card? (no hostpci setting)
also:
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

will overwrite this line:

cpu: host,hidden=1,flags=+pcid

why do you need to set it in args?
if for the vendor_id, you can either set this manually on the cpu line,e.g.

Code:
cpu: host,hidden=1,flags=+pcid,hv-vendor-id=FOO

or set 'x-vga=on' on the hostpci line (we detect a graphic card passthrough and set the vendor automatically to something)

So I may have messed around with that file after the panic started. In your opinion, what settings should I be using to successfully pass through the card?
 
Okay, went back and made some changes, after reading the PCI(e) doc.


Code:
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 4
cpu: host,hidden=1,flags=+pcid,hv-vendor-id=FOO
efidisk0: sas15k:vm-110-disk-0,size=128K
hostpci0: 03:00,pcie=1
ide0: iso:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: iso:iso/Win10_1903_V1_English_x64.iso,media=cdrom
machine: q35
memory: 6144
name: win10
net0: virtio=3A:AC:CE:AA:F7:C8,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-zfs:vm-110-disk-0,cache=writeback,iothread=1,replicate=0,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4d2208a9-e110-413d-9560-4d37b3b7b1ba
sockets: 1
vmgenid: 9e355091-d2f4-4f99-a749-f6aacfb9b7d0

Code:
root@splinter:~# dmesg | grep IOMMU
[    0.953135] DMAR: IOMMU enabled
[    1.365841] DMAR-IR: IOAPIC id 3 under DRHD base  0xfbffe000 IOMMU 0
[    1.365844] DMAR-IR: IOAPIC id 0 under DRHD base  0xdfffc000 IOMMU 1
[    1.365846] DMAR-IR: IOAPIC id 2 under DRHD base  0xdfffc000 IOMMU 1

Started the VM, and the same panic behavior. The entire proxmox host goes unresponsive after about 60 seconds.

Code:
Sep  4 11:09:42 splinter kernel: [  449.488837] perf: interrupt took too long (919371 > 417846), lowering kernel.perf_event_max_sample_rate to 250
Sep  4 11:09:45 splinter kernel: [  450.846894] hrtimer: interrupt took 1046613435 ns
Sep  4 11:09:55 splinter kernel: [  460.783512] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 58.208 msecs

Message from syslogd@splinter at Sep  4 11:09:55 ...
 kernel:[  461.631639] watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [kvm:6976]
Sep  4 11:09:55 splinter kernel: [  461.302570] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 229.328 msecs
Sep  4 11:09:55 splinter kernel: [  461.612431] perf: interrupt took too long (2015763 > 1149213), lowering kernel.perf_event_max_sample_rate to 250
Sep  4 11:09:55 splinter kernel: [  461.631641] Modules linked in: tcp_diag inet_diag nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache veth ebtable_filter ebtables ip_set ip6table_filter ip6_tables iptable_filter bpfilter softdog snd_hda_codec_hdmi nfnetlink_log nfnetlink intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ast aes_x86_64 crypto_simd ttm cryptd glue_helper drm_kms_helper intel_cstate snd_hda_intel intel_rapl_perf pcspkr drm input_leds snd_hda_codec joydev snd_hda_core fb_sys_fops syscopyarea sysfillrect snd_hwdep sysimgblt snd_pcm snd_timer snd mei_me mei soundcore ipmi_si ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass sunrpc vfio_iommu_type1 vfio ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) btrfs xor
Sep  4 11:09:55 splinter kernel: [  461.631705]  zstd_compress raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid ixgbe igb ahci xfrm_algo i2c_algo_bit lpc_ich i2c_i801 libahci dca isci mdio mpt3sas libsas raid_class scsi_transport_sas wmi
Sep  4 11:09:55 splinter kernel: [  461.631727] CPU: 30 PID: 6976 Comm: kvm Tainted: P           O      5.0.18-1-pve #1
Sep  4 11:09:55 splinter kernel: [  461.631728] Hardware name: Cirrascale VB1416/GA-7PESH2, BIOS R17 06/26/2018
Sep  4 11:09:55 splinter kernel: [  461.631737] RIP: 0010:smp_call_function_single+0xd2/0xf0
Sep  4 11:09:55 splinter kernel: [  461.631740] Code: 65 48 33 0c 25 28 00 00 00 75 34 c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 6d fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 8b 54 24 18 <83> e2 01 75 f5 eb ca 8b 05 41 74 a2 01 85 c0 75 88 0f 0b eb 84 e8
Sep  4 11:09:55 splinter kernel: [  461.631742] RSP: 0000:ffffc08608e8fba0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Sep  4 11:09:55 splinter kernel: [  461.631745] RAX: 0000000000000000 RBX: 000000000000001e RCX: 0000000000000830
Sep  4 11:09:55 splinter kernel: [  461.631746] RDX: 0000000000000003 RSI: 00000000000008fb RDI: 0000000000000830
Sep  4 11:09:55 splinter kernel: [  461.631748] RBP: ffffc08608e8fbf8 R08: 000000000000000c R09: ffff9ca80f407c00
Sep  4 11:09:55 splinter kernel: [  461.631750] R10: 0000000000000004 R11: 00000065ceb9e927 R12: ffffffff95e84470
Sep  4 11:09:55 splinter kernel: [  461.631751] R13: ffffc08608e8fd08 R14: 0000000000000001 R15: ffff9cb4095721a0
Sep  4 11:09:55 splinter kernel: [  461.631754] FS:  00007fd50b228dc0(0000) GS:ffff9cb40fb80000(0000) knlGS:0000000000000000
Sep  4 11:09:55 splinter kernel: [  461.631756] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  4 11:09:55 splinter kernel: [  461.631757] CR2: 00007f15a6a64000 CR3: 00000015122ba006 CR4: 00000000000626e0
Sep  4 11:09:55 splinter kernel: [  461.631759] Call Trace:
Sep  4 11:09:55 splinter kernel: [  461.631766]  ? flush_tlb_func_common.constprop.9+0x230/0x230
Sep  4 11:09:55 splinter kernel: [  461.631771]  ? cpumask_next_and+0x1e/0x20
Sep  4 11:09:55 splinter kernel: [  461.631773]  smp_call_function_many+0x223/0x250
Sep  4 11:09:55 splinter kernel: [  461.631777]  ? flush_tlb_func_common.constprop.9+0x230/0x230
Sep  4 11:09:55 splinter kernel: [  461.631779]  on_each_cpu_mask+0x2a/0x70
Sep  4 11:09:55 splinter kernel: [  461.631782]  ? x86_configure_nx+0x50/0x50
Sep  4 11:09:55 splinter kernel: [  461.631785]  on_each_cpu_cond_mask+0xab/0x140
Sep  4 11:09:55 splinter kernel: [  461.631788]  ? flush_tlb_func_common.constprop.9+0x230/0x230
Sep  4 11:09:55 splinter kernel: [  461.631791]  native_flush_tlb_others+0xca/0x130
Sep  4 11:09:55 splinter kernel: [  461.631794]  flush_tlb_mm_range+0xde/0x110
Sep  4 11:09:55 splinter kernel: [  461.631800]  change_protection+0x91c/0xb00
Sep  4 11:09:55 splinter kernel: [  461.631806]  change_prot_numa+0x1c/0x40
Sep  4 11:09:55 splinter kernel: [  461.631809]  task_numa_work+0x1f0/0x300
Sep  4 11:09:55 splinter kernel: [  461.631815]  task_work_run+0x9d/0xc0
Sep  4 11:09:55 splinter kernel: [  461.631820]  exit_to_usermode_loop+0xf2/0x100
Sep  4 11:09:55 splinter kernel: [  461.631823]  prepare_exit_to_usermode+0x66/0x90
Sep  4 11:09:55 splinter kernel: [  461.631827]  retint_user+0x8/0x8
Sep  4 11:09:55 splinter kernel: [  461.631829] RIP: 0033:0x55648e071e1b
Sep  4 11:09:55 splinter kernel: [  461.631832] Code: 0f 1f 80 00 00 00 00 31 c0 4c 39 c7 0f 94 c0 48 83 c4 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 31 d2 48 89 f8 4c 89 c6 49 f7 f0 <83> c6 01 31 c0 d1 fe 48 63 f6 48 39 f2 0f 92 c0 48 83 c4 08 5b 5d
Sep  4 11:09:55 splinter kernel: [  461.631833] RSP: 002b:00007ffc957458d0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Sep  4 11:09:55 splinter kernel: [  461.631836] RAX: 00000000000004b8 RBX: 00007fd382b80a88 RCX: 000055648e071e10
Sep  4 11:09:55 splinter kernel: [  461.631838] RDX: 000000000000174d RSI: 0000000000002e9c RDI: 0000000000dc076d
Sep  4 11:09:55 splinter kernel: [  461.631839] RBP: 00000002da2f52fd R08: 0000000000002e9c R09: 00333ac06ff9b274
Sep  4 11:09:55 splinter kernel: [  461.631841] R10: 000000003b9aca00 R11: 00333ac06ff9b274 R12: 00000002da2f52fc
Sep  4 11:09:55 splinter kernel: [  461.631842] R13: 0000000000000001 R14: 0000000000000000 R15: 000055648e0724c0
 
can you post the output of
Code:
lspci -nnk
find /sys/kernel/iommu_groups -type l
 
the only thing i saw is
Kernel driver in use: snd_hda_intel

of the audio part of the graphics card, maybe it helps to blacklist that driver, or bind the device also to vfio-pci?
 
the only thing i saw is


of the audio part of the graphics card, maybe it helps to blacklist that driver, or bind the device also to vfio-pci?

I added it to the conf file I created in `/etc/modprobe.d/` as part of the reddit guide

Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel
 
I added it to the conf file I created in `/etc/modprobe.d/` as part of the reddit guide

Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

rebooted the host, after adding the snd_hda_intel drive to the blacklist, and it appears to be having the same issue. I'm getting the same error messages on the console as before.
 
Just to test, I tried adding the audio part of the graphics card to the VM...

Code:
Virtual Environment 6.0-5
Search
Virtual Machine 110 (win10) on node 'splinter'
Server View
Logs
()
kvm: -device vfio-pci,host=03:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: Failed to mmap 0000:03:00.0 BAR 3. Performance may be slow
kvm: -device vfio-pci,host=03:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on: vfio 0000:03:00.0: device is already attached
TASK ERROR: start failed: command '/usr/bin/kvm -id 110 -name win10 -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -smbios 'type=1,uuid=4d2208a9-e110-413d-9560-4d37b3b7b1ba' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/zvol/sas15k/vm-110-disk-0' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/110.vnc,password -no-hpet -cpu 'host,+pcid,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=FOO,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,kvm=off' -m 6144 -object 'iothread,id=iothread-virtioscsi0' -device 'vmgenid,guid=9e355091-d2f4-4f99-a749-f6aacfb9b7d0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=03:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=03:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=03:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=03:00.1,id=hostpci1.1,bus=ich9-pcie-port-2,addr=0x0.1' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:c083bd9974a4' -drive 'file=/mnt/pve/iso/template/iso/virtio-win-0.1.141.iso,if=none,id=drive-ide0,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'file=/mnt/pve/iso/template/iso/Win10_1903_V1_English_x64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/zvol/rpool/data/vm-110-disk-0,if=none,id=drive-scsi0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:AC:CE:AA:F7:C8,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35' -global 'kvm-pit.lost_tick_policy=discard'' failed: exit code 1

Code:
root@splinter:/var/log# cat /etc/pve/qemu-server/110.conf
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 4
cpu: host,hidden=1,flags=+pcid,hv-vendor-id=FOO
efidisk0: sas15k:vm-110-disk-0,size=128K
hostpci0: 03:00,pcie=1
hostpci1: 03:00,pcie=1
ide0: iso:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: iso:iso/Win10_1903_V1_English_x64.iso,media=cdrom
machine: q35
memory: 6144
name: win10
net0: virtio=3A:AC:CE:AA:F7:C8,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-zfs:vm-110-disk-0,cache=writeback,iothread=1,replicate=0,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4d2208a9-e110-413d-9560-4d37b3b7b1ba
sockets: 1
vmgenid: 9e355091-d2f4-4f99-a749-f6aacfb9b7d0

doesn't seem right, going to remove it.

still panicking when booting the VM. are my config entries correct?
 
Last edited:
I added it to the conf file I created in `/etc/modprobe.d/` as part of the reddit guide
did you update the initramfs after? check with lspci -k to see if that device really is not using the driver anymore

hostpci0: 03:00,pcie=1
hostpci1: 03:00,pcie=1

this tries to passthrough the same both times, once is enough

the shorthand '03:00' instead of '03:00.0' already passes through all functions (including the audio device)

maybe try only the gpu function? (with '03:00.0')
 
did you update the initramfs after? check with lspci -k to see if that device really is not using the driver anymore



this tries to passthrough the same both times, once is enough

the shorthand '03:00' instead of '03:00.0' already passes through all functions (including the audio device)

maybe try only the gpu function? (with '03:00.0')

I ran initramfs and verified that the driver isn't in use anymore

Code:
03:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1)
        Subsystem: NVIDIA Corporation GF106GL [Quadro 2000]
        Kernel modules: nvidiafb, nouveau
03:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
        Subsystem: NVIDIA Corporation GF106 High Definition Audio Controller
        Kernel modules: snd_hda_intel

Still panics. Going to try 03:00.0
 
Still panics with the following settings

Code:
cat /etc/pve/qemu-server/110.conf
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 4
cpu: host,hidden=1,flags=+pcid,hv-vendor-id=FOO
efidisk0: sas15k:vm-110-disk-0,size=128K
hostpci0: 03:00.0,pcie=1
ide0: iso:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: iso:iso/Win10_1903_V1_English_x64.iso,media=cdrom
machine: q35
memory: 6144
name: win10
net0: virtio=3A:AC:CE:AA:F7:C8,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-zfs:vm-110-disk-0,cache=writeback,iothread=1,replicate=0,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4d2208a9-e110-413d-9560-4d37b3b7b1ba
sockets: 1
vmgenid: 9e355091-d2f4-4f99-a749-f6aacfb9b7d0

Code:
Sep 10 20:57:44 splinter kernel: [  638.273892]  znvpair(PO) zavl(PO) icp(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid ixgbe igb ahci i2c_algo_bit xfrm_algo i2c_i801 libahci lpc_ich isci dca mpt3sas mdio libsas raid_class scsi_transport_sas wmi
Sep 10 20:57:44 splinter kernel: [  638.273911] CPU: 27 PID: 31324 Comm: kvm Tainted: P           O L    5.0.18-1-pve #1
Sep 10 20:57:44 splinter kernel: [  638.273925] RAX: 0000000000000007 RBX: ffff99accfae3c00 RCX: ffff99a0cf9e9260
Sep 10 20:57:44 splinter kernel: [  638.273930] R10: ffffd84b21252440 R11: 0000000000000000 R12: ffffffffa8c84470
Sep 10 20:57:44 splinter kernel: [  638.273934] FS:  00007f4c007ff700(0000) GS:ffff99accfac0000(0000) knlGS:0000000000000000
Sep 10 20:57:44 splinter kernel: [  638.273937] CR2: 00007f8f16186008 CR3: 00000015f72c8006 CR4: 00000000000626e0
Sep 10 20:57:44 splinter kernel: [  638.273946]  ? flush_tlb_func_common.constprop.9+0x230/0x230
Sep 10 20:57:44 splinter kernel: [  638.273952]  ? x86_configure_nx+0x50/0x50
Sep 10 20:57:44 splinter kernel: [  638.273960]  ? flush_tlb_func_common.constprop.9+0x230/0x230
Sep 10 20:57:44 splinter kernel: [  638.273970]  pmdp_invalidate+0xe9/0xf0
Sep 10 20:57:44 splinter kernel: [  638.283977]  ? default_wake_function+0x12/0x20
Sep 10 20:57:44 splinter kernel: [  638.283987]  task_work_run+0x9d/0xc0
Sep 10 20:57:44 splinter kernel: [  638.291519] RIP: 0033:0x7f4c10c7d427
Sep 10 20:57:44 splinter kernel: [  638.291528] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001e
Sep 10 20:57:44 splinter kernel: [  639.309805]  ? vmx_vcpu_run+0x28c/0xc70 [kvm_intel]
 
did you update the initramfs after? check with lspci -k to see if that device really is not using the driver anymore



this tries to passthrough the same both times, once is enough

the shorthand '03:00' instead of '03:00.0' already passes through all functions (including the audio device)

maybe try only the gpu function? (with '03:00.0')

I'm still striking out over here...any other ideas?
 
So I did more troubleshooting today, upon starting the windows VM that has the passthrough the following is in `/var/log/messages`

Code:
Sep 16 19:02:05 splinter kernel: [  360.498080] vfio-pci 0000:03:00.0: BAR 3: can't reserve [mem 0xd0000000-0xd3ffffff 64bit pref]
Sep 16 19:02:24 splinter kernel: [  372.015058] INFO: NMI handler (ghes_notify_nmi) took too long to run: 11.637 msecs
Sep 16 19:02:24 splinter kernel: [  379.896253] INFO: NMI handler (ghes_notify_nmi) took too long to run: 52.213 msecs

The can't reserve line is curious. Shortly after that, the panics start.