[SOLVED] X99 Extreme4 - GPU Passthrough Windows 10 with Radeon RX570 (Black screen)

Restarted Proxmox server

BOOT_IMAGE=/boot/vmlinuz-5.15.85-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init
Looks good for kernel 5.15. I though you switched to kernel 6.1 but I guess I'm mistaken.
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev ef) Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB [1da2:e353] Kernel driver in use: vfio-pci Kernel modules: amdgpu 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0] Subsystem: Sapphire Technology Limited Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1da2:aaf0] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
Looks good (for kernel 5.15 with sysfb_init work-around).
no errors - jumps back to prompt
Maybe my command was wrong (difference between - and _ ). What is the output of lsmod | grep vendor (after rebooting Proxmox and before starting the VM)?
You are not loading vendor-reset when starting Proxmox. Is this because you load it later? What is the output of cat /etc/modules?
-bash: /sys/pci/bus/devices/0000:01:00.0/reset_method: No such file or directory
Looks like vendor-reset was not loaded (when starting Proxmox). Is this because you load and activate it later?
Feb 27 12:58:42 pve kernel: vendor_reset: loading out-of-tree module taints kernel. Feb 27 12:58:42 pve kernel: vendor_reset_hook: installed Feb 27 12:58:42 pve systemd-modules-load[530]: Inserted module 'vendor_reset' Feb 27 13:00:24 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing pre-reset Feb 27 13:00:24 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing reset Feb 27 13:00:24 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing post-reset Feb 27 13:00:24 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: reset result = 0 Feb 27 13:00:26 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing pre-reset Feb 27 13:00:26 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing reset Feb 27 13:00:26 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing post-reset Feb 27 13:00:26 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: reset result = 0 Feb 27 13:00:27 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing pre-reset Feb 27 13:00:27 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing reset Feb 27 13:00:27 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: performing post-reset Feb 27 13:00:27 pve kernel: vfio-pci 0000:01:00.0: AMD_POLARIS10: reset result = 0 Feb 27 13:00:38 pve kernel: usb 2-13: reset full-speed USB device number 2 using xhci_hcd Feb 27 13:00:38 pve kernel: usb 2-14: reset low-speed USB device number 3 using xhci_hcd
Looks like vendor-reset is loaded and activated and actually working. Do you load and activate it in a hookscript? I would expect that the VM is working fine now.
I am also getting this in the tasks:

kvm: vfio: Cannot reset device 0000:01:00.1, no available reset mechanism. kvm: vfio: Cannot reset device 0000:01:00.1, no available reset mechanism. TASK OK
That's should not be a problem.

Do you see output on a physical display connected to the GPU?
What does the contents of your VM configuration file (from /etc/pve/qemu-server/ directory) look like?
 
Looks good for kernel 5.15. I though you switched to kernel 6.1 but I guess I'm mistaken.

Looks good (for kernel 5.15 with sysfb_init work-around).

Maybe my command was wrong (difference between - and _ ). What is the output of lsmod | grep vendor (after rebooting Proxmox and before starting the VM)?
You are not loading vendor-reset when starting Proxmox. Is this because you load it later? What is the output of cat /etc/modules?

Looks like vendor-reset was not loaded (when starting Proxmox). Is this because you load and activate it later?

Looks like vendor-reset is loaded and activated and actually working. Do you load and activate it in a hookscript? I would expect that the VM is working fine now.

That's should not be a problem.

Do you see output on a physical display connected to the GPU?
What does the contents of your VM configuration file (from /etc/pve/qemu-server/ directory) look like?

I do have updated kernel to 6.1, but is still on 5.15 I do not understand

lsmod | grep vendor
vendor_reset 114688 0

cat /etc/modules

# /etc/modules: kernel modules to load at boot time. # # This file contains the names of kernel modules that should be loaded # at boot time, one per line. Lines beginning with "#" are ignored. vendor-reset vfio vfio_iommu_type1 vfio_pci vfio_virqfd

hookscript

nano /lib/systemd/system/vrwa.service

Entered the following lines
[Unit] Description=vrwa Service After=multi-user.target [Service] ExecStart=/usr/bin/bash -c 'echo device_specific > /sys/bus/pci/devices/0000:01:00.0/reset_method' [Install] WantedBy=multi-user.target

exit and ran
systemctl enable vrwa
 
/etc/pve/qemu-server/ directory

#args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' balloon: 0 bios: ovmf boot: order=virtio0;ide2;net0;ide0 cores: 4 cpu: host,hidden=1,flags=+pcid efidisk0: VM-SSD:vm-1300-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M hostpci0: 0000:01:00,pcie=1 ide0: Synology-Backup:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K ide2: Synology-Backup:iso/Win10_22H2_EnglishInternational_x64.iso,media=cdrom,size=5969910K machine: q35 memory: 8192 meta: creation-qemu=7.1.0,ctime=1677484308 name: WindowsRe10 net0: virtio=86:91:EA:8E:D3:0D,bridge=vmbr0,firewall=1 numa: 0 ostype: win10 scsihw: virtio-scsi-single smbios1: uuid=32c7bc3d-45c0-4e52-b0e6-d4e629ecaf2e sockets: 1 usb0: host=046d:c52b usb1: host=0738:1107 vga: none virtio0: Synology-Backup:1300/vm-1300-disk-0.qcow2,cache=writeback,iothread=1,size=32G vmgenid: 24db0f79-ff5f-4681-b6ed-9d43c8690663
 
I do have updated kernel to 6.1, but is still on 5.15 I do not understand
I also don't understand... What is the output of pveversion -v?
vendor_reset 114688 0
Good, vendor_reset is loaded at boot. Then I cannot explain why you cannot set the reset_method. I don't understand...
nano /lib/systemd/system/vrwa.service

Entered the following lines
[Unit] Description=vrwa Service After=multi-user.target [Service] ExecStart=/usr/bin/bash -c 'echo device_specific > /sys/bus/pci/devices/0000:01:00.0/reset_method' [Install] WantedBy=multi-user.target

exit and ran
systemctl enable vrwa
This is not a Proxmox hookscript but a system-d service. I still don't understand how you didn't get any of the AMD_POLARIS10 lines about reset in the system logs before and it works now. Let's stop changing things, because you get those lines now and vendor-reset is resetting the GPU successfully.

#args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' balloon: 0 bios: ovmf boot: order=virtio0;ide2;net0;ide0 cores: 4 cpu: host,hidden=1,flags=+pcid efidisk0: VM-SSD:vm-1300-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M hostpci0: 0000:01:00,pcie=1 ide0: Synology-Backup:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K ide2: Synology-Backup:iso/Win10_22H2_EnglishInternational_x64.iso,media=cdrom,size=5969910K machine: q35 memory: 8192 meta: creation-qemu=7.1.0,ctime=1677484308 name: WindowsRe10 net0: virtio=86:91:EA:8E:D3:0D,bridge=vmbr0,firewall=1 numa: 0 ostype: win10 scsihw: virtio-scsi-single smbios1: uuid=32c7bc3d-45c0-4e52-b0e6-d4e629ecaf2e sockets: 1 usb0: host=046d:c52b usb1: host=0738:1107 vga: none virtio0: Synology-Backup:1300/vm-1300-disk-0.qcow2,cache=writeback,iothread=1,size=32G vmgenid: 24db0f79-ff5f-4681-b6ed-9d43c8690663
Remove the line with args. I know it it commented out but it is for NVidia and really wrong for AMD GPUs. Also don't use hidden=1, that's for NVidia and not AMD GPUs.

Use a fixed machine version. Use machine: pc-q35-6.2 instead of machine: q35, because Windows AMD GPU drivers don't like later or changing versions (as I wrote before). This is the most likely cause of the Code 43.
 
pveversion -v

proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve) pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec) pve-kernel-helper: 7.3-4 pve-kernel-5.15: 7.3-2 pve-kernel-5.15.85-1-pve: 5.15.85-1 pve-kernel-5.15.74-1-pve: 5.15.74-1 ceph-fuse: 15.2.17-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.3 libproxmox-backup-qemu0: 1.3.1-1 libpve-access-control: 7.3-1 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.3-2 libpve-guest-common-perl: 4.2-3 libpve-http-server-perl: 4.1-5 libpve-storage-perl: 7.3-2 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-1 lxcfs: 5.0.3-pve1 novnc-pve: 1.3.0-3 proxmox-backup-client: 2.3.3-1 proxmox-backup-file-restore: 2.3.3-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.5.5 pve-cluster: 7.3-2 pve-container: 4.4-2 pve-docs: 7.3-1 pve-edk2-firmware: 3.20220526-1 pve-firewall: 4.2-7 pve-firmware: 3.6-3 pve-ha-manager: 3.5.1 pve-i18n: 2.8-2 pve-qemu-kvm: 7.1.0-4 pve-xtermjs: 4.16.0-1 qemu-server: 7.3-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+2 vncterm: 1.7-1 zfsutils-linux: 2.1.9-pve1
 
With all the changes something broke my system and now none of the VM are working (Ubuntu/MacOS)
I will reinstall Proxmox fresh and write what I have been doing. I will update once I have updated Kernel in order to proceed.

Thank you @leesteken for your support and help so far.
 
With all the changes something broke my system and now none of the VM are working (Ubuntu/MacOS)
What changes? I only wanted you to change the Windows VM configuration.
I will reinstall Proxmox fresh and write what I have been doing. I will update once I have updated Kernel in order to proceed.
Doing everything over again just mean that everything can be different and new mistakes might creep in and we'll have to start all over again
 
What changes? I only wanted you to change the Windows VM configuration.

Doing everything over again just mean that everything can be different and new mistakes might creep in and we'll have to start all over again

How did you updated the kernel to 6.1 and did the vendor reset? I have tried the links you send me across, but it is always not being applied?
 
How did you updated the kernel to 6.1 and did the vendor reset? I have tried the links you send me across, but it is always not being applied?
I don't remember, it's been a long time ago that I installed vendor-reset. I don't feel like switching this thread to a "how do I install kernel 6.1" after 68 posts, sorry. You don't need kernel 6.1 to get passthrough working.
 
  • Like
Reactions: sutlnet
What changes? I only wanted you to change the Windows VM configuration.

I changed the windows config and my Ubuntu and MacOS stopped working. I have now reinstalled Proxmox (clean). Followed everything what you have said on NickSherlock homepage.
 
When I type journalctl -b 0 | grep reset

This is what I get:

Feb 27 16:59:14 pve kernel: vendor_reset: loading out-of-tree module taints kernel. Feb 27 16:59:14 pve kernel: vendor_reset_hook: installed Feb 27 16:59:14 pve systemd-modules-load[532]: Inserted module 'vendor_reset' Feb 27 17:00:22 pve pveproxy[1466]: problem with client ::ffff:10.10.10.30; Connection reset by peer

lsmod | grep vendor

Vendor reset seems to be working vendor_reset 114688 0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!