[SOLVED] [2022] Dual GPU Passthrough: BAR 1 error and other tips

zartarr88

New Member
Nov 27, 2022
16
2
3
**Update 2022-12-17**
**Guide:** There are a lot of things and steps and mistakes in this thread. I would advise people to try not do what I did and willy nilly apply things that worked for other people. The one good outcome of this thread is that it aggregates virtually every single tip that has been scattered across the forums and reddit to get dual gpu pass through working.
**Tips:** keep things simple, I've found Proxmox 7.2 requires the least tweaks, along with 2 AMD GPU's .

You know when you do something multiple times, over a couple of weeks and throw up your hands in defeat? That is what this post is. I'm about to share all my settings as cleanly and efficiently as possible. I will note, it looks like a cluster of every idea mashed together but I assure you
(#1) I have tried starting from scratch 2x and
(#2) When going down the rabbit hole of fixing 1 error, another arises
(#3) I have had either the MacOS VM (AMD) gpu passthrough work... and the Win10 VM (NVIDIA) passthrough work ; so I know there is a solution but when fixing 1 issue the other one stops.

The final error I end up with that I can never resolve; and at this point I have the AMD MAC VM working, and the Win10 won't even boot.

Code:
vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]

GRUB:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet vga=off amd_iommu=on iommu=pt pci=realloc video=efifb:off,vesafb:off,simplefb:off nofb nomodeset pcie_acs_override=downstream,multifunction initcall_blacklist=sysfb_init kvm.ignore_msrs=1"

MODPROBES:
Code:
root@proxmox:~# ls /etc/modprobe.d
amd64-microcode-blacklist.conf    iommu_unsafe_interrupts.conf  nvidia.conf      snd-hda-intel.conf  vfio-pci.conf
dkms.conf            kvm.conf              pve-blacklist.conf  vfio.conf          vfio_pci.conf

lcpci -nn the 2 GPUs
Code:
2d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)
2d:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
2e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
2e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]

cat proc iomem
Code:
root@proxmox:~# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
  00000000-00000000 : PCI Bus 0000:00
  000a0000-000dffff : PCI Bus 0000:00
  000f0000-000fffff : System ROM
00100000-09d81fff : System RAM
09d82000-09ffffff : Reserved
0a000000-0a1fffff : System RAM
0a200000-0a210fff : ACPI Non-volatile Storage
0a211000-ab075fff : System RAM
ab076000-ab076fff : Reserved
ab077000-ab0a0fff : System RAM
ab0a1000-ab0a1fff : Reserved
ab0a2000-d740cfff : System RAM
d740d000-d7469fff : Reserved
d746a000-daceefff : System RAM
dacef000-db0a2fff : Reserved
  db084000-db087fff : MSFT0101:00
    db084000-db087fff : MSFT0101:00
  db088000-db08bfff : MSFT0101:00
    db088000-db08bfff : MSFT0101:00
db0a3000-db106fff : ACPI Tables
db107000-dcc06fff : ACPI Non-volatile Storage
dcc07000-ddb56fff : Reserved
ddb57000-ddbfefff : Unknown E820 type
ddbff000-deffffff : System RAM
df000000-dfffffff : Reserved
e0000000-fcffffff : PCI Bus 0000:00
  f0000000-f7ffffff : PCI MMCONFIG 0000 [bus 00-7f]
    f0000000-f7ffffff : Reserved
      f0000000-f7ffffff : pnp 00:00
  fa000000-fb0fffff : PCI Bus 0000:2d
    fa000000-faffffff : 0000:2d:00.0
    fb000000-fb07ffff : 0000:2d:00.0
    fb080000-fb083fff : 0000:2d:00.1
  fb400000-fbffffff : PCI Bus 0000:20
    fb400000-fbffffff : PCI Bus 0000:21
      fb400000-fb8fffff : PCI Bus 0000:24
        fb400000-fb7fffff : 0000:24:00.0
          fb400000-fb7fffff : atlantic_mmio
        fb800000-fb83ffff : 0000:24:00.0
        fb840000-fb84ffff : 0000:24:00.0
          fb840000-fb84ffff : atlantic_mmio
        fb850000-fb850fff : 0000:24:00.0
          fb850000-fb850fff : atlantic_mmio
      fba00000-fbbfffff : PCI Bus 0000:2a
        fba00000-fbafffff : 0000:2a:00.3
          fba00000-fbafffff : xhci-hcd
        fbb00000-fbbfffff : 0000:2a:00.1
          fbb00000-fbbfffff : xhci-hcd
      fbc00000-fbcfffff : PCI Bus 0000:2c
        fbc00000-fbc007ff : 0000:2c:00.0
          fbc00000-fbc007ff : ahci
      fbd00000-fbdfffff : PCI Bus 0000:2b
        fbd00000-fbd007ff : 0000:2b:00.0
          fbd00000-fbd007ff : ahci
      fbe00000-fbefffff : PCI Bus 0000:28
        fbe00000-fbe03fff : 0000:28:00.0
          fbe00000-fbe03fff : iwlwifi
      fbf00000-fbffffff : PCI Bus 0000:27
        fbf00000-fbf6ffff : 0000:27:00.0
        fbf70000-fbf7ffff : 0000:27:00.0
          fbf70000-fbf7ffff : r8169
        fbf80000-fbf9bfff : 0000:27:00.0
        fbf9c000-fbf9ffff : 0000:27:00.0
  fc000000-fc1fffff : PCI Bus 0000:30
    fc000000-fc0fffff : 0000:30:00.3
      fc000000-fc0fffff : xhci-hcd
    fc100000-fc107fff : 0000:30:00.4
  fc200000-fc2fffff : PCI Bus 0000:2e
    fc200000-fc23ffff : 0000:2e:00.0
    fc240000-fc25ffff : 0000:2e:00.0
    fc260000-fc263fff : 0000:2e:00.1
  fc300000-fc3fffff : PCI Bus 0000:01
    fc300000-fc303fff : 0000:01:00.0
      fc300000-fc303fff : nvme
fd200000-fd2fffff : Reserved
  fd200000-fd2fffff : pnp 00:01
    fd210510-fd21053f : MSFT0101:00
fd380000-fd3fffff : amd_iommu
fd400000-fd5fffff : Reserved
fea00000-fea0ffff : Reserved
feb80000-fec01fff : Reserved
  fec00000-fec003ff : IOAPIC 0
  fec01000-fec013ff : IOAPIC 1
fec10000-fec10fff : Reserved
  fec10000-fec10fff : pnp 00:04
fed00000-fed00fff : Reserved
  fed00000-fed003ff : HPET 0
    fed00000-fed003ff : PNP0103:00
fed40000-fed44fff : Reserved
fed80000-fed8ffff : Reserved
  fed81500-fed818ff : AMDI0030:00
    fed81500-fed818ff : AMDI0030:00 AMDI0030:00
fedc0000-fedc0fff : pnp 00:04
fedc2000-fedcffff : Reserved
fedd4000-fedd5fff : Reserved
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : pnp 00:04
ff000000-ffffffff : Reserved
  ff000000-ffffffff : pnp 00:04
100000000-101f2fffff : System RAM
  89ec00000-89fc02047 : Kernel code
  89fe00000-8a03a4fff : Kernel rodata
  8a0400000-8a071c1bf : Kernel data
  8a0a43000-8a0ffffff : Kernel bss
101f300000-101fffffff : Reserved
1020000000-7fffffffff : PCI Bus 0000:00
  7fc0000000-7fd1ffffff : PCI Bus 0000:2d
    7fc0000000-7fcfffffff : 0000:2d:00.0
    7fd0000000-7fd1ffffff : 0000:2d:00.0
  7fe0000000-7ff01fffff : PCI Bus 0000:2e
    7fe0000000-7fefffffff : 0000:2e:00.0
    7ff0000000-7ff01fffff : 0000:2e:00.0

MODPROBE: 1
Code:
root@proxmox:~# cat /etc/modprobe.d/amd64-microcode-blacklist.conf
# The microcode module attempts to apply a microcode update when
# it autoloads.  This is not always safe, so we block it by default.
blacklist microcode

MODPROBE: 2
Code:
root@proxmox:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

MODPROBE: 3
Code:
root@proxmox:~# cat /etc/modprobe.d/snd-hda-intel.conf
options snd-hda-intel enable_msi=1

MODPROBE: 4
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio_pci.conf
options vfio-pci disable_idle_d3=1

MODPROBE: 5
Code:
root@proxmox:~# cat /etc/modprobe.d/nvidia.conf
softdep nvidiafb pre: vfio-pci

MODPROBE: 6
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2206,10de:1aef,1002:67df,1002:aaf0 disable_vga=1 disable_idle_d3=1

MODPROBE: 7
Code:
root@proxmox:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_typ1 allow_unsafe_interrupts=1

MODPROBE: 8
Code:
root@proxmox:~# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
# blacklist amdgpu
blacklist radeon
blacklist snd_hda_intel
blacklist nouveau
blacklist nvidia
# blacklist nvidia_drm
# blacklist i2c_nvidia_gpu
# blacklist snd_hda_codec_hdmi
# blacklist snd_hda_intel
# blacklist snd_hda_codec
# blacklist snd_hda_core

MODPROBE: 9 - as I was writing this post I noticed I have both a vfio-pci.conf and a vfio.conf
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio-pci.conf
options vfio-pci ids=10de:2206,10de:1aef,1002:67df,1002:aaf0 disable_vga=1

VM Config:
Code:
root@proxmox:~# cat /etc/pve/qemu-server/101.conf
acpi: 1
agent: 1
args: -cpu 'host,+svm,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hypervisor=off'
balloon: 0
bios: ovmf
boot: order=sata0;net0
cores: 4
cpu: host,hidden=1,flags=+pcid
efidisk0: zfsdata01:101/vm-101-disk-0.qcow2,efitype=4m,size=528K
hookscript: local:snippets/gpu-hookscript.sh
ide2: zfsdata01:iso/virtio-win-0.1.196.iso,media=cdrom,size=486642K
machine: pc-q35-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1669092247
name: win10
net0: virtio=12:EE:56:16:A1:50,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata0: zfsdata01:101/vm-101-disk-2.qcow2,size=407286422K
scsihw: virtio-scsi-pci
smbios1: uuid=2625ba06-b78a-4d32-b10d-55fcbbb71b6e
sockets: 1
unused0: zfsdata01:101/vm-101-disk-1.qcow2
vga: vmware
vmgenid: c53fc1cc-6486-474e-8e21-52f2d009791f

Because I don't comprehend fully what I'm doing, and grab solutions off the net ; I've ended up with a match of rules just made up with my own twisted logic of ya that would work. So that is why I'm sharing everything possibly I can to let you in the hopes that someone can say "hey stupid, you don't need x or y and your double dipping here" we might end up going through a journey together , but I'm pretty certain I've encountered all the issues you can think of d3 hot cold something, BAR 1, BAR 0, BAR 3 ... lots comes to mind
 
Last edited:
It's a lot of information at once and I don't know where to begin. Please allow me to make some remarks.
Code:
vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
This looks like the NVidia GPU is used during boot of the Proxmox host and not properly released by efifb or vesafb (when using kernel earlier than 5.15) or simplefb (when using kernel 5.15 or higher).. However, I don't see the tell-tale signs in the iomem.
GRUB:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet vga=off amd_iommu=on iommu=pt pci=realloc video=efifb:off,vesafb:off,simplefb:off nofb nomodeset pcie_acs_override=downstream,multifunction initcall_blacklist=sysfb_init kvm.ignore_msrs=1"
amd_iommo=on does nothing because it is on by default. iommu=pt usually does nothing for passthrough.
video=efifb:off,vesafb:off,simplefb:off is wrong and does not work anymore, you need to write it as video=efifb:off video=vesafb:off video=simplefb:off. But you don't need all three only the one that is actually used during boot and it depends on the kernel version. Also, it does not resolve the "BAR can't reserve" problem with simplefb (since kernel 5.15 and higher) and you need initcall_blacklist=sysfb_init instead.
pcie_acs_override=downstream,multifunction can be dangerous and will invalidate the useful IOMMU group information about I would like to see.
I have never seen that vga=off nofb nomodeset makes a difference.
Please double check that your system uses GRUB and not systemd-boot and that parameters are passed by using cat /proc/cmdline.
cat proc iomem
Code:
root@proxmox:~# cat /proc/iomem
  fa000000-fb0fffff : PCI Bus 0000:2d
    fa000000-faffffff : 0000:2d:00.0
    fb000000-fb07ffff : 0000:2d:00.0
    fb080000-fb083fff : 0000:2d:00.1
Some memory is still claimed by the NVidia GPU. Maybe BOOTFB only shows for AMD GPUs.
Can you physically swap the NVidia and AMD GPUs? Or otherwise make sure the Proxmox host boots with the AMD GPU?
By upgrading to kernel 5.19 and installing vendor-reset to work-around the AMD GPU reset issue and enabling the device_specific reset_method, you should have no trouble with your AMD GPU passthrough even after using it for Proxmox host boot.
Code:
root@proxmox:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
Do you need this? Maybe MacOS or Windows needs this, I wouldn't know.
Code:
root@proxmox:~# cat /etc/modprobe.d/snd-hda-intel.conf
options snd-hda-intel enable_msi=1
This does not make sense (as you blacklist it later) and early bind GPU audio devices to vfio-pci.
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio_pci.conf
options vfio-pci disable_idle_d3=1
Remove this one (above) as it is already in /etc/modprobe.d/vfio.conf (below)
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2206,10de:1aef,1002:67df,1002:aaf0 disable_vga=1 disable_idle_d3=1
After upgrading and vendor-reset, you can (and should) remove the ID of the AMD GPU 1002:67df. Keep the ID for the audio device and always passthrough All Functions. You might want to add softdep snd_hda_intel pre: vfio-pci and softdep nouveau pre: vfio-pci to make sure vfio-pci takes precedence.
Code:
root@proxmox:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_typ1 allow_unsafe_interrupts=1
Do you need this? I would not do this unless there is a specific need.
Code:
root@proxmox:~# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
# blacklist amdgpu
blacklist radeon
blacklist snd_hda_intel
blacklist nouveau
blacklist nvidia
# blacklist nvidia_drm
# blacklist i2c_nvidia_gpu
# blacklist snd_hda_codec_hdmi
# blacklist snd_hda_intel
# blacklist snd_hda_codec
# blacklist snd_hda_core
Don't blacklist amdgpu after upgrading and vendor-reset. You don't need to blacklist nvidia because your should not install the NVidia proriety drivers on the Proxmox host. Most of the blacklists you added don't make sense and won't be needed.
Code:
root@proxmox:~# cat /etc/modprobe.d/vfio-pci.conf
options vfio-pci ids=10de:2206,10de:1aef,1002:67df,1002:aaf0 disable_vga=1
Remove this one also and keep only the vfio.conf with my changes.
root@proxmox:~# cat /etc/pve/qemu-server/101.conf acpi: 1 agent: 1 args: -cpu 'host,+svm,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hypervisor=off'
This is not needed when you just enable Primary GPU (x-vga=1).
balloon: 0 bios: ovmf boot: order=sata0;net0 cores: 4 cpu: host,hidden=1,flags=+pcid
hidden=1 is not needed since NVidia decided not to block virtualization anymore in their recent drivers. I don't think you need flags=+pcid on an AMD CPU.
efidisk0: zfsdata01:101/vm-101-disk-0.qcow2,efitype=4m,size=528K hookscript: local:snippets/gpu-hookscript.sh
What does that script do. If you can switch the NVidia and AMD GPU, you probably don't need it for the NVIdia GPU.
ide2: zfsdata01:iso/virtio-win-0.1.196.iso,media=cdrom,size=486642K machine: pc-q35-7.1 memory: 16384 meta: creation-qemu=7.1.0,ctime=1669092247 name: win10 net0: virtio=12:EE:56:16:A1:50,bridge=vmbr0,firewall=1 numa: 0 ostype: win10 sata0: zfsdata01:101/vm-101-disk-2.qcow2,size=407286422K scsihw: virtio-scsi-pci smbios1: uuid=2625ba06-b78a-4d32-b10d-55fcbbb71b6e sockets: 1 unused0: zfsdata01:101/vm-101-disk-1.qcow2 vga: vmware
Set this to none (but it does not matter with Primary GPU.
vmgenid: c53fc1cc-6486-474e-8e21-52f2d009791f
hostpci0: 0000:2d:00,pcie=1,x-vga=1 appears to be missing completely! Note that the 0000:2d:00 might change when switching the NVidia and AMD GPUs. Adjust the other VM configuration accordingly.
Because I don't comprehend fully what I'm doing, and grab solutions off the net ; I've ended up with a match of rules just made up with my own twisted logic of ya that would work. So that is why I'm sharing everything possibly I can to let you in the hopes that someone can say "hey stupid, you don't need x or y and your double dipping here" we might end up going through a journey together , but I'm pretty certain I've encountered all the issues you can think of d3 hot cold something, BAR 1, BAR 0, BAR 3 ... lots comes to mind
Please let us know what CPU and motherboard you are using. Please disable pcie_acs_override and show the IOMMU groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done. Unless you have a X370, X470 or X570 you probably need the override, so make sure not to start any VM automatically after boot!
 
Last edited:
Thank you very much for your help so far, as you can tell I have tried a plethora and over time having lost track of what worked.

cat > proc > cmd command you mentioned:
noting how apparently in this 2nd run I tried this "edge" kernel ; as I found it as a solution mentioned in another thread... but honestly based on your advice and the amount of the things that we just removed, makes me think I should have paid attention to the dates of those suggestions.
Code:
BOOT_IMAGE=/boot/vmlinuz-6.0.9-edge root=/dev/mapper/pve-root ro quiet pci=realloc initcall_blacklist=sysfb_init kvm.ignore_msrs=1


Removed:
/etc/modprobe.d/kvm.conf

Edited:
/etc/modprobe.d/vfio.conf >> I just removed the 1002:67df and added the 2 lines (will cat show you later)
just have these files left now:
Code:
root@proxmox:~# ls /etc/modprobe.d/
amd64-microcode-blacklist.conf    dkms.conf  kvm.conf  nvidia.conf  pve-blacklist.conf  vfio.conf

# Proofs
Code:
root@proxmox:~# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist radeon
# blacklist snd_hda_intel
blacklist nouveau
# blacklist nvidia
# blacklist nvidia_drm
# blacklist i2c_nvidia_gpu
# blacklist snd_hda_codec_hdmi
# blacklist snd_hda_intel
# blacklist snd_hda_codec
# blacklist snd_hda_core

Code:
root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2206,10de:1aef,1002:aaf0 disable_vga=1 disable_idle_d3=1
softdep snd_hda_intel pre: vfio-pci
softdep nouveau pre: vfio-pci

From 101.conf I removed these lines entirely
acpi: 1 agent: 1 args: -cpu 'host,+svm,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hypervisor=off'
I know it didn't have gpu enabled in that run of showing conf ; its because I'm at the same time wanting to see if it creates another problem... If I have dmesg and journalctl errors I wanted to see if it had anything to do with scsi drivers etc.

MOTHERBOARD: MSI UNIFY X570
CPU: 3900x ryzen
- I cannot move the 3080 to the 2nd slot, because of case limitation ; although I put in an order for a new case here which should arrive by Monday to test that out.

Code:
IOMMU group 26 2d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)
IOMMU group 26 2d:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
IOMMU group 27 2e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev ff)
IOMMU group 27 2e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0] (rev ff)

At this point just want to mention the MAC OS VM has stopped working , no gpu passthrough ; can't VNC in.
nada for Windows as well

Dmesg errors:
Code:
[  436.515874] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  436.607894] AMD-Vi: Completion-Wait loop timed out
[  436.732645] AMD-Vi: Completion-Wait loop timed out
[  436.743897] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eee00]
[  436.876639] AMD-Vi: Completion-Wait loop timed out
[  437.001413] AMD-Vi: Completion-Wait loop timed out
[  437.145884] AMD-Vi: Completion-Wait loop timed out
[  437.270329] AMD-Vi: Completion-Wait loop timed out
[  437.404810] AMD-Vi: Completion-Wait loop timed out
[  437.549151] AMD-Vi: Completion-Wait loop timed out
[  437.673507] AMD-Vi: Completion-Wait loop timed out
[  437.743873] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eee50]
[  437.817648] AMD-Vi: Completion-Wait loop timed out
[  437.941994] AMD-Vi: Completion-Wait loop timed out
[  438.086468] AMD-Vi: Completion-Wait loop timed out
[  438.230970] AMD-Vi: Completion-Wait loop timed out
[  438.375277] AMD-Vi: Completion-Wait loop timed out
[  438.519290] AMD-Vi: Completion-Wait loop timed out
[  438.662931] AMD-Vi: Completion-Wait loop timed out
[  438.743858] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eeee0]
[  438.806606] AMD-Vi: Completion-Wait loop timed out
[  438.950308] AMD-Vi: Completion-Wait loop timed out
[  439.093994] AMD-Vi: Completion-Wait loop timed out
[  439.237665] AMD-Vi: Completion-Wait loop timed out
[  439.381356] AMD-Vi: Completion-Wait loop timed out
[  439.525055] AMD-Vi: Completion-Wait loop timed out
[  439.649184] AMD-Vi: Completion-Wait loop timed out
[  439.743832] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eef50]
[  441.363592] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.363781] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.363960] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.365054] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  441.365239] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  441.365421] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs

journalctl
Code:
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]

Looks like both GPUs are using vfio-pci
Code:
2d:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. GA102 [GeForce RTX 3080]
    Flags: fast devsel, IRQ 255, IOMMU group 26
    Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Memory at 7fc0000000 (64-bit, prefetchable) [size=256M]
    Memory at 7fd0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at fb000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Legacy Endpoint, MSI 00
    Capabilities: [b4] Vendor Specific Information: Len=14 <?>
    Capabilities: [100] Virtual Channel
    Capabilities: [250] Latency Tolerance Reporting
    Capabilities: [258] L1 PM Substates
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [420] Advanced Error Reporting
    Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900] Secondary PCI Express
    Capabilities: [bb0] Physical Resizable BAR
    Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
    Capabilities: [d00] Lane Margining at the Receiver <?>
    Capabilities: [e00] Data Link Feature <?>
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

2d:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
    Subsystem: eVga.com. Corp. Device 3897
    Flags: fast devsel, IRQ 255, IOMMU group 26
    Memory at fb080000 (32-bit, non-prefetchable) [disabled] [size=16K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [160] Data Link Feature <?>
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

2e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu

2e:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

I also want to thank you in advance, I know we haven't resolved it ; and its fine if you end up abandoning it. Honestly, I would have abandoned it myself had I not at one point had Windows working without MAC, and then MAC working without Windows which then tells me I know its possible, but something is off . I'm hoping to document this thread at the very least to capture my doings/thinking so it helps others. Have a lessons learned. I have been using vfio virt-manager on linux for a while and I forked over quite a bit of dollars to literally get it for proxmox ; so I have high hopes here and pretty committed .
 
Last edited:
Edited:
/etc/modprobe.d/vfio.conf >> I just removed the 1002:67df and added the 2 lines (will cat show you later)
Did you run update-initramfs -u and update-grub?
On second though: probably you did, but this depends on swapping the GPUs, using kernel 5.19 (check with uname -a) and getting vendor-reset to work.
MOTHERBOARD: MSI UNIFY X570
CPU: 3900x ryzen
- I cannot move the 3080 to the 2nd slot, because of case limitation ; although I put in an order for a new case here which should arrive by Monday to test that out.
X570 is a good choice for passthrough, as almost everything is in properly isolated IOMMU groups. The 3900X limits it to PCIe 3 but that's fine.
On second though: If swapping your GPUs is an option in another case (on without a case), we might get this to work. No promises though...
At this point just want to mention the MAC OS VM has stopped working , no gpu passthrough ; can't VNC in.
nada for Windows as well
I'm sorry. I don't know anything about MacOS VMs. Did you install and setup vendor-reset? This is the key to getting your AMD GPU to reset properly after allowing amdgpu to load (which I suggested by changing vfio-pci ids=... etc.). However, this all depends on you swapping the two GPUs or selecting the AMD to be used as boot GPU, which I believe is only supported by Gigabyte Ryzen motherboards (I didn't know you used MSI before).
On second though: Maybe it works with you new case.
Code:
[  436.515874] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  436.607894] AMD-Vi: Completion-Wait loop timed out
[  436.732645] AMD-Vi: Completion-Wait loop timed out
[  436.743897] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eee00]
[  436.876639] AMD-Vi: Completion-Wait loop timed out
[  437.001413] AMD-Vi: Completion-Wait loop timed out
[  437.145884] AMD-Vi: Completion-Wait loop timed out
[  437.270329] AMD-Vi: Completion-Wait loop timed out
[  437.404810] AMD-Vi: Completion-Wait loop timed out
[  437.549151] AMD-Vi: Completion-Wait loop timed out
[  437.673507] AMD-Vi: Completion-Wait loop timed out
[  437.743873] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eee50]
[  437.817648] AMD-Vi: Completion-Wait loop timed out
[  437.941994] AMD-Vi: Completion-Wait loop timed out
[  438.086468] AMD-Vi: Completion-Wait loop timed out
[  438.230970] AMD-Vi: Completion-Wait loop timed out
[  438.375277] AMD-Vi: Completion-Wait loop timed out
[  438.519290] AMD-Vi: Completion-Wait loop timed out
[  438.662931] AMD-Vi: Completion-Wait loop timed out
[  438.743858] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eeee0]
[  438.806606] AMD-Vi: Completion-Wait loop timed out
[  438.950308] AMD-Vi: Completion-Wait loop timed out
[  439.093994] AMD-Vi: Completion-Wait loop timed out
[  439.237665] AMD-Vi: Completion-Wait loop timed out
[  439.381356] AMD-Vi: Completion-Wait loop timed out
[  439.525055] AMD-Vi: Completion-Wait loop timed out
[  439.649184] AMD-Vi: Completion-Wait loop timed out
[  439.743832] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:2e:00.0 address=0x1001eef50]
[  441.363592] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.363781] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.363960] vfio-pci 0000:2e:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  441.365054] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  441.365239] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  441.365421] vfio-pci 0000:2e:00.1: vfio_bar_restore: reset recovery - restoring BARs
Looks like you didn't install or setup vendor-reset for your AMD GPU. But since you cannot swap the GPUs to use the AMD GPU for boot, I guess there is not point.
On second though: Maybe it works with you new case if you can swap the GPUs.
journalctl
Code:
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
I expect that all this could be solved if your (particular) AMD GPU is used a the boot GPU with vendor-reset working and the NVidia GPU is not used for boot.
On second though: I went through my first responses again, now that I noticed you might be able to swap the GPUs with a new case.
 
I do have vendor-reset set up.
In fact I have that and this gpu hookscript ;
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

I have made progress right now ;
maybe I need echo device specific > ? thing I have seen float around but I'm not sure where that goes ? contab ? service ?

Whats happening is my AMD GPU is being booted first ; I know this because with the nvidia first it would have things stuck on it. (wording like Proxmox etc.) ... I boot via AMD ; and then when I try to launch the VM I just get a blank screen ... nothing in dmesg... and journalctl still has BAR 1 errors... 2d has now become the AMD GPU, and 2e is the NVIDIA... i was able to change gpu by choosing CSM in bios.

Do you or anyone have thoughts on fixing the blank screen on AMD GPU...
 
I do have vendor-reset set up.
Since you removed the vfio-pci ID of the AMD GPU, it appears to have reset issues which indicate that something is not setup right and it's not working.
Please double check /etc/modules and add a hookscript for the MacOS VM to enable device_specific. You should see several lines with AMD_POLARIS in journalctl when starting the VM.
In fact I have that and this gpu hookscript ;
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi
Why are you removing device 0000:01:00.0 from the PCIe bus?! That's not one of your GPUs! You don't need this (crude and in your case wrong) script at all when using initcall_blacklist=sysfb_init.
I have made progress right now ;
maybe I need echo device specific > ? thing I have seen float around but I'm not sure where that goes ? contab ? service ?
After every reboot of the Proxmox host, or in a hookscript for the VMs that use the AMD GPU.
Whats happening is my AMD GPU is being booted first ; I know this because with the nvidia first it would have things stuck on it. (wording like Proxmox etc.) ... I boot via AMD ; and then when I try to launch the VM I just get a blank screen ... nothing in dmesg... and journalctl still has BAR 1 errors... 2d has now become the AMD GPU, and 2e is the NVIDIA... i was able to change gpu by choosing CSM in bios.
This indicates that vendor-reset is not setup correctly, see above. If you can switch the boot GPU in the BIOS, then that's good enough (and you don't need a new case). .
What does echo device_specific >/sys/bus/pci/devices/0000:2d:00.0/reset_method (when AMD is the boot GPU) output?
Is vendor-reset loaded? What does lsmod | grep vendor output?

Please boot with the AMD GPU, remove the hookscript for the Windows VM, add a hookscript to set device_specific to the MacOS VM and make sure vendor-reset is working.

EDIT: I still don't know which Proxmox version and kernel you are using (did I miss it?). Can you check with uname -a?
 
Last edited:
My apologies,
The fact that this is the first time in my life I'm actually utilizing a forum to try and troubleshoot things shows how 'green' I am in terms of communicating and some of the stuff I'm copying and pasting on the fly as I change things.

# Uname -a
Code:
root@proxmox:~# uname -a
Linux proxmox 5.19.17-1-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-1 (Mon, 14 Nov 2022 20:25:12  x86_64 GNU/Linux

# Hookscripts
I have 3 things
## 1 - via crontab -e
where I typed: @reboot /root/fix_gpu_pass.sh
Code:
root@proxmox:~# cat fix_gpu_pass.sh
#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:30\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
echo 1 > /sys/bus/pci/rescan

## 2 - via systemctl service
Code:
root@proxmox:~# systemctl status vrwa
● vrwa.service - vrwa Service
     Loaded: loaded (/lib/systemd/system/vrwa.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2022-12-02 20:39:07 MST; 14h ago
    Process: 1801 ExecStart=/usr/bin/bash -c echo device_specific > /sys/bus/pci/devices/0000:2d:00.0/reset_method (code=exited, status=0/SUCCESS)
   Main PID: 1801 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Dec 02 20:39:07 proxmox systemd[1]: Started vrwa Service.
Dec 02 20:39:07 proxmox systemd[1]: vrwa.service: Succeeded.

## 3 - via gpu hookscripts on both machines ; this is the one above I copy/pasted 01 ; but in reality mine is 2d and 30 now
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

I took our the NVIDIA gpu ; I am using 5700xt and RX 580 now in the hopes that it simplifies this game, and we just deal with AMD. Please advise if this was dumb.

journalctl now shows 2d ; being the RX 580 the issue still
Code:
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]

dmesg doesn't have anything in a color to show error. i believe.

lsmod | grep vendor
Code:
root@proxmox:~# lsmod | grep vendor
vendor_reset          114688  0
 
My apologies,
The fact that this is the first time in my life I'm actually utilizing a forum to try and troubleshoot things shows how 'green' I am in terms of communicating and some of the stuff I'm copying and pasting on the fly as I change things.
Don't worry, it's often not clear what information is relevant and which exact problem is occurring. Eventually it'll become clear and sometimes I'm slow in picking up on clues myself.
# Uname -a
Code:
root@proxmox:~# uname -a
Linux proxmox 5.19.17-1-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-1 (Mon, 14 Nov 2022 20:25:12  x86_64 GNU/Linux
Your Proxmox looks relatively) up to date and this kernel works well with the amdgpu driver and GPU passthrough.
# Hookscripts
I have 3 things
## 1 - via crontab -e
where I typed: @reboot /root/fix_gpu_pass.sh
Code:
root@proxmox:~# cat fix_gpu_pass.sh
#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:30\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
echo 1 > /sys/bus/pci/rescan
Where does 0000:30:00.0 suddenly come from? OK, that's probably the 5700XT, which also needs vendor-reset for passthrough (to work more than once).
I still insist that you do not run this remove-from-PCI-bus-and-reconnect-it-by-scanning-the-bus. It also interferes with vendor-reset and device_specific reset_method because you run that only once and then "virtually unplug and replug" the GPU, which make the system see a "new" GPU that has the reset_method set to the default. Please, stop using this crude method with AMD GPUs.
## 2 - via systemctl service
Code:
root@proxmox:~# systemctl status vrwa
● vrwa.service - vrwa Service
     Loaded: loaded (/lib/systemd/system/vrwa.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2022-12-02 20:39:07 MST; 14h ago
    Process: 1801 ExecStart=/usr/bin/bash -c echo device_specific > /sys/bus/pci/devices/0000:2d:00.0/reset_method (code=exited, status=0/SUCCESS)
   Main PID: 1801 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Dec 02 20:39:07 proxmox systemd[1]: Started vrwa Service.
Dec 02 20:39:07 proxmox systemd[1]: vrwa.service: Succeeded.
I do think it would be better to move this to the hookscript, to make sure that vendor-reset can do it's thing for all AMD GPUs.
## 3 - via gpu hookscripts on both machines ; this is the one above I copy/pasted 01 ; but in reality mine is 2d and 30 now
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

I took our the NVIDIA gpu ; I am using 5700xt and RX 580 now in the hopes that it simplifies this game, and we just deal with AMD. Please advise if this was dumb.
Changing multiple things at once often is. However, I do have more experience with AMD GPUs and this might actually be helpful.
journalctl now shows 2d ; being the RX 580 the issue still
Code:
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
Nov 22 01:35:30 proxmox kernel: vfio-pci 0000:2d:00.0: BAR 1: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
You are sabotaging vendor-reset with the remove-from-PCI-bus-and-reconnect-it-by-scanning-the-bus script(s) that you are using.
dmesg doesn't have anything in a color to show error. i believe.
I suggest using journalctl -b 0 to see the current system log (since the last boot of the system)
lsmod | grep vendor
Code:
root@proxmox:~# lsmod | grep vendor
vendor_reset          114688  0
Good, vendor-reset is loaded and I assume it is present in /etc/modules.

In principle you could go all the way back to a simple Proxmox installation without any additional kernel parameters, extra services, crontab etc. And Just have vendor-reset handle everything by enabling it for each GPU (before starting the VMs).

Please replace your hookscripts for the VMs with only enabling device_specific fior vendor-reset (and don't virtually unplug the GPU). No need for systemd-services or cron jobs. You don't need initcall_blacklist=sysfb_init and you don't blacklist amdgpu. Just let Proxmox boot and amdgpu drive the VGA function of the 580 which you use for booting the system. Early bind the audio function and all functions of the 5700XT to vfio-pci using it numeric IDs (lspci -nn).

After those changes, please boot Proxmox (check that it only shows on the 580) without starting the VMs. Check/show the output of cat /proc/cmdline and lspci -nnks 2d:00 and lspci -nnks 30:00. Check/show the VM configurations and their new hookscripts.

The easiest way to check if AMD GPU passthrough works for a VM is to boot the VM with a Ubuntu installer LiveDVD and see if you have output of a physical display connected to that GPU. Don't install Ubuntu, just see if it starts and you see the Ubuntu desktop on the display. First try the one with the 5700XT and if that is successful try the VM with the 580.

EDIT: I have a AMD HD7750 and AMD RX570 on a Gigabyte X570S and both passthrough fine (to LInux VMs) with vendor-reset, without any tricks or work-arounds on Proxmox with kernel 5.19. I boot from the 570 with amdgpu to see boot messages and early bind the 7750, during shutdown I give the 570 back to Proxmox (to see shutdown messages).
 
Last edited:
I literally went on the local used market, and fate would have it an individual was selling one a couple of blocks down the road (the 5700xt) . Genuinely when Linus Torvalds said %*&# Nvidia ; I am in the moment feeling that sentiment. Also please note, you do not need to insist. I am genuinely going to do whatever anyone says, because as you can see by the cluster of things I have tried, I feel like I might have lucked out with all my setup working in virt-manager for the past few years; because it clearly demonstrates my lack of understanding.

I have to say, without a doubt. The fact that I have tried practically every bit of advice, its outlaid here in this thread, and you have dropped genuine golden nuggets for each set of commands and what it does. I think this thread is going to be a goldmine for understanding, I don't know about others. But I've copied some of your notes, if not most into a markdown files on my laptop. Its as if I'm learning Unix at the same time ;) thank you mate.

Note:
1) fix_gpu_pass.sh removed, and removed from crontab

Code:
You are sabotaging vendor-reset with the remove-from-PCI-bus-and-reconnect-it-by-scanning-the-bus script(s) that you are using.
Man this made me cringe, I knew intuitively something is happening where its double dipping and working against me.

I'm sorry but for this post, I haven't accomplished all the tasks laid out, if you could clarify one bit so I can proceed.

gpu-hookscript currently is:
(in): nano /var/lib/vz/snippets/gpu-hookscript.sh
I have a 2nd one in that same directory called gpu-hookscript2.sh where I just change the id from 2d to 30
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

You want me to remove vrwa service ...
and make the gpu hookscript this? can you please verify, I have all the quotations and single brackets everything correct ? sometimes I have seen echo 'device_specific' > .... in threads
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: 'device_specific' for Virtual Machine $1"
    echo device_specific > /sys/bus/pci/devices/0000:2d:00.0/reset_method
fi

Update1:
As of right now , assuming that hookscript is correct above...
Errors:
Code:
root@proxmox:~# qm start 100
gpu-hookscript: echo device_specific for Virtual Machine 100
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
kvm: -device vfio-pci,host=0000:2d:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,rombar=0,multifunction=on: Failed to mmap 0000:2d:00.0 BAR 0. Performance may be slow
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.

Code:
root@proxmox:~# lspci -nnks 2d:00
2d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
    Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB [1da2:e387]
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu
2d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
    Subsystem: Sapphire Technology Limited Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1da2:aaf0]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

Code:
root@proxmox:~# lspci -nnks 30:00
30:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
    Subsystem: Gigabyte Technology Co., Ltd Radeon RX 5700 XT Gaming OC [1458:2313]
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu
30:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel
 
Last edited:
I literally went on the local used market, and fate would have it an individual was selling one a couple of blocks down the road (the 5700xt) . Genuinely when Linus Torvalds said %*&# Nvidia ; I am in the moment feeling that sentiment. Also please note, you do not need to insist. I am genuinely going to do whatever anyone says, because as you can see by the cluster of things I have tried, I feel like I might have lucked out with all my setup working in virt-manager for the past few years; because it clearly demonstrates my lack of understanding.
Your 3800 is much more powerful than the 5700TX, so maybe we can swap them later and get it to work. Or maybe someone here knows how to get a NVidia GPU to passthrough.
Regarding the use of "insist": I'm not always aware how my posts come across both technically and socially, but we seem to manage. I usually take a few back and forth to sort the details.
I have to say, without a doubt. The fact that I have tried practically every bit of advice, its outlaid here in this thread, and you have dropped genuine golden nuggets for each set of commands and what it does. I think this thread is going to be a goldmine for understanding, I don't know about others. But I've copied some of your notes, if not most into a markdown files on my laptop. Its as if I'm learning Unix at the same time ;) thank you mate.
I guess helping with passthrough is (mostly) the thing I do here (after having a rocky road myself on this subject since Proxmox 3.2). Nice to hear that you're learning stuff along the way and that my former experiences are helping you.
Note:
1) fix_gpu_pass.sh removed, and removed from crontab

Code:
You are sabotaging vendor-reset with the remove-from-PCI-bus-and-reconnect-it-by-scanning-the-bus script(s) that you are using.
Man this made me cringe, I knew intuitively something is happening where its double dipping and working against me.
Sorry for the cringe. I genuinely did not expect this combination but afterwards it's obvious that they will interfere. I'm sure you didn't do it on purpose but I guess the word "sabotage" did get the point across.
I'm sorry but for this post, I haven't accomplished all the tasks laid out, if you could clarify one bit so I can proceed.
No problem, I guess it's just you and me in this thread and we are changing lots of things at the same time and doing two GPU passthroughs. That's why I included the check/show list in the previous post. And probably, you'll want to do additional USB controller passthroughs for each VM later on...
gpu-hookscript currently is:
(in): nano /var/lib/vz/snippets/gpu-hookscript.sh
I have a 2nd one in that same directory called gpu-hookscript2.sh where I just change the id from 2d to 30
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:2d\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

You want me to remove vrwa service ...
I don't know what "vrwa" is, sorry.
and make the gpu hookscript this? can you please verify, I have all the quotations and single brackets everything correct ? sometimes I have seen echo 'device_specific' > .... in threads
Code:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: 'device_specific' for Virtual Machine $1"
    echo device_specific > /sys/bus/pci/devices/0000:2d:00.0/reset_method
fi
Yes, this is all you need for each VM that uses a AMD GPU supported by vendor-reset.I usually put double quotes (") around /sys/bus/pci/devices/0000:2d:00.0/reset_method but that might not be necessary. Just make sure you use the right PCI numbers for the right VM.
 
I am at a point where I'm actually unsure if I've corrupted both images, and its not a proxmox thing anymore (intuition says ...)
Could you shed light on disk settings ? what do you use ?
I think the continuous hard stops overtime maybe... They are on a zfs pool. qcow2 ; thoughts on best practice ?
1670095482289.png

also on start up procedure both VMs do this:
Code:
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.

does that mean reset isn't actually working ?
 
I am at a point where I'm actually unsure if I've corrupted both images, and its not a proxmox thing anymore (intuition says ...)
Could you shed light on disk settings ? what do you use ?
I think the continuous hard stops overtime maybe... They are on a zfs pool. qcow2 ; thoughts on best practice ?
qcow2 on top of a directory on ZFS is unnecessarily slow. Try using VirtIO SCSI directly on ZFS in the future (needs drivers for Windows and I don't know about MacOS).
Using unsafe write back in combination with power losses (and system resets and crashes) is disastrous for the system inside the VM. I guess you better restore from backup. And boot the Ubuntu installer liveDVD for testing.

also on start up procedure both VMs do this:
Code:
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.
kvm: vfio: Cannot reset device 0000:2d:00.1, no available reset mechanism.

does that mean reset isn't actually working ?
It means that there is no proper reset for the audio function of the 580. I get the same for my 570 and it works fine for display output (and I don't use the HDMI audio).
 
  • Like
Reactions: zartarr88
@leesteken ; success on both gpu passthrough. MAC OS after a couple of reboots just booted fine. Windows is being finnicky, DISM repair etc. might have to export image and bring it back in and load it fresh. Going to gather my thoughts and do a clean post on where the system sits in terms of settings for the general public. Would like to thank you for your help on the GPU passthrough aspect, I do believe its working because I can swap the HDMI cable to the monitor to both GPUs and get the display out.

I could use your help on just a final check and lingering questions as I collect my thoughts.
 
  • Like
Reactions: leesteken
Sorry before I go down this rabbit hole ; because I've seen similar screenshots here of people being stuck on this screen...
I have the vioscsi drivers already loaded successfully.. could you kindly check my conf file ? so I can rule out its not a setting issue and just an image issue maybe ? It is almost as if the display frame is just frozen, because won't take any keyboard inputs....

Update1: even when I shutdown the VM the same screen remains on the monitor; so I retract what I said . I think GPU passthrough is still not working on the 30 id
1670099197586.jpeg
Code:
root@proxmox:~# cat /etc/pve/qemu-server/101.conf
# hookscript%3A local%3Asnippets/gpu-hookscript.sh
balloon: 0
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 4
cpu: host
efidisk0: zfsdata01:101/vm-101-disk-0.qcow2,efitype=4m,size=528K
hookscript: local:snippets/gpu-hookscript2.sh
hostpci0: 0000:30:00,pcie=1,x-vga=1
ide0: zfsdata01:iso/Win10_21H1_English_x64.iso,media=cdrom,size=5687620K
ide2: zfsdata01:iso/virtio-win-0.1.196.iso,media=cdrom,size=486642K
machine: pc-q35-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1669092247
name: win10
net0: virtio=12:EE:56:16:A1:50,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: zfsdata01:101/vm-101-disk-2.qcow2,cache=unsafe,discard=on,size=407286422K
scsihw: virtio-scsi-pci
smbios1: uuid=2625ba06-b78a-4d32-b10d-55fcbbb71b6e
sockets: 1
unused0: zfsdata01:101/vm-101-disk-1.qcow2
unused1: zfsdata01:101/vm-101-disk-3.qcow2
usb0: host=1-1
usb1: host=1-2
vga: none
vmgenid: c53fc1cc-6486-474e-8e21-52f2d009791f

I find these messages to be interesting, which might imply still a gpu passthrough issue ; but its on the MAC VM side so likely not... but the MAC VM is running fine ... it might be because I'm switched display input to HDMI 2 which is the windows VM
Code:
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d7e, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d7f, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d80, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d81, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d82, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d83, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d84, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d85, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d86, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d87, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d88, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d89, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d8a, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d8b, 0x0,1) failed: Device or resource busy
Nov 22 01:59:05 proxmox QEMU[2098]: kvm: vfio_region_write(0000:2d:00.0:region1+0x1b7d8c, 0x0,1) failed: Device or resource busy
 
Last edited:
For AMD GPUs, I recommend not enabling Primary GPU (x-vga=1) which is really tuned for NVidia. Try without it (keep vga: none)?
The VirtIO drivers (196) are a little old (try 215 or the latest) but you installed them before switching to SCSI?
And you keep running unsafe cache mode? Maybe your VM is just broken due to disk corruption.
Sorry but I don't know. Other people have more experience with Windows. Maybe search the forum or start a new thread?

The vfio_region_write's are unknown to me. Do they keep repeating or do you only get two dozen of them? In which case, I would not worry too much if the VM works.
 
@leesteken ; sorry for the delay .
I have had quite a few days of troubleshooting other issues to be able to get back to the baseline where we left off.
I would like to document some of the problems:
1) Aquantic 10G Nic ; continuous connection drops - I just changed the NIC out
2) ZFS pool on 1 single SN 850x SSD caused random data loss ; so recreated everything on a pair of 8TB Seagate NAS drives

Because of my inexperience, I've found just finding hardware that just works is better than the headache involved or breakage at updates.

I have GPU passthrough working on the Win10 , on the Mac OS error:

Code:
root@proxmox:~# qm start 100
gpu-hookscript: echo device_specific for Virtual Machine 100
kvm: smbios: Could not open 'smbios': No such file or directory
start failed: QEMU exited with code 1

If this problem is resolvable do let me know, because its confusing.
 
@leesteken ; sorry for the delay .
I'm not in a hurry, don't worry.
I have had quite a few days of troubleshooting other issues to be able to get back to the baseline where we left off.
I would like to document some of the problems:
1) Aquantic 10G Nic ; continuous connection drops - I just changed the NIC out
2) ZFS pool on 1 single SN 850x SSD caused random data loss ; so recreated everything on a pair of 8TB Seagate NAS drives
I don't remember reading issues about that hardware on this forum. Maybe they are just broken?
Because of my inexperience, I've found just finding hardware that just works is better than the headache involved or breakage at updates.
I have GPU passthrough working on the Win10 , on the Mac OS error:
root@proxmox:~# qm start 100 gpu-hookscript: echo device_specific for Virtual Machine 100 kvm: smbios: Could not open 'smbios': No such file or directory start failed: QEMU exited with code 1
If this problem is resolvable do let me know, because its confusing.
I can't help with MacOS but this looks like a VM configuration error. Did you use the smbios1 or args setting or something else in the configuration cannot be found and happens to be called smbios. Please show the contents of the /etc/pve/qemu-server/100.conf file.
 
Ok I just want to say,
I have literally everything working perfectly now.
Thank you very much @leesteken , the details you provided in this thread allowed me to navigate the issues when I started from scratch again.
I would like close this out and not muddy the details in this thread ; but I would like to ask you about best practice for backups, schedules, other scripts. I'm right now at a point where I think things are working well and I just want to make this bulletproof and not go through this headache again. !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!