Hello all,
My Hardware:
Motherboard - Asrock Rack Rome8d-2t
CPU - Epyc 7F72
RAM - 128GB 3200MHZ ECC
GPU - RTX-2080 Super - passed through to Windows
OS Drive - Samsung 980 Pro 500GB
VM OS Drives - ZFS Mirror 2x960GB Samsung P9A3
Windows VM Game Storage - ZFS Mirror 2x1TB WD SN850
HBA Card - LSI-9211-8i - passed through to TrueNAS
TrueNAS Drives - 6x6TB WD Ironwolf
NIC - 82599ES 10Gbe - passed through to TrueNAS
I am running Proxmox 8.1.3 on an AMD Epyc Gen 2 build with an Asrock ROMED8-2T motherboard. I have a number of Ubuntu server VMs running and one Windows 11 VM, though I had these issues on a Windows 10 VM too. The Windows 11 VM has an RTX-2080 passed through and I am using an X550 NIC with SR-IOV enabled (VFs used on the other VMs too). All VMs are running on a ZFS mirror. The drives are U.2 Samsung drives, but this started happening before I moved my VMs to these drives. I think that it was happening before I set up SR-IOV too.
I constantly get BSODs on the Windows VMs without any dump files. Sometimes there isn't even a BSOD code, but usually it is the DirectX error in the subject line of this post. This likely points to something being up with the GPU or it's driver, at least from a Windows perspective, but I have went through all of the troubleshooting steps that I can find inside of the Windows guest machine. Removed drivers with DDU, installed earlier versions, ran several system commands to fix corrupt files, swapped the GPU for an RTX-2070 Super, disabled fast start, disabled any power management things related to PCIe, and probably a few other things that I am forgetting.
There is no evidence inside of Windows that shows there is an issue until the BSOD. I've looked in the Event viewer, Device Manager Events on the GPUs themselves, and ran
The VM is used for remote gaming and doesn't crash all that often if I am running Minecraft Bedrock edition. However, I can get it to crash almost everytime if I open the launcher or the Java edition. It will crash if I try to uninstall either of those. I had to remove the GPU and install them without it, which worked, but I still got the BSOD when launching them without the GPU. It will also crash if I try to update the Nvidia driver with first removing it with DDU in safe mode. Didn't try to remove it in normal mode.
So, now I am wondering if something is up with my Proxmox configuration. I have went through the PCI and PCIe passthrough docs several times and I think it is set up just fine. I have forced the PCIe slot to be Gen3 x16 in the BIOS, which is what the card is, but the issue happens when it is set to auto or manually configured.
Is there anyone out there with a similar issue or that might have some input on how to troubleshoot further???
VM Config:
My Hardware:
Motherboard - Asrock Rack Rome8d-2t
CPU - Epyc 7F72
RAM - 128GB 3200MHZ ECC
GPU - RTX-2080 Super - passed through to Windows
OS Drive - Samsung 980 Pro 500GB
VM OS Drives - ZFS Mirror 2x960GB Samsung P9A3
Windows VM Game Storage - ZFS Mirror 2x1TB WD SN850
HBA Card - LSI-9211-8i - passed through to TrueNAS
TrueNAS Drives - 6x6TB WD Ironwolf
NIC - 82599ES 10Gbe - passed through to TrueNAS
I am running Proxmox 8.1.3 on an AMD Epyc Gen 2 build with an Asrock ROMED8-2T motherboard. I have a number of Ubuntu server VMs running and one Windows 11 VM, though I had these issues on a Windows 10 VM too. The Windows 11 VM has an RTX-2080 passed through and I am using an X550 NIC with SR-IOV enabled (VFs used on the other VMs too). All VMs are running on a ZFS mirror. The drives are U.2 Samsung drives, but this started happening before I moved my VMs to these drives. I think that it was happening before I set up SR-IOV too.
I constantly get BSODs on the Windows VMs without any dump files. Sometimes there isn't even a BSOD code, but usually it is the DirectX error in the subject line of this post. This likely points to something being up with the GPU or it's driver, at least from a Windows perspective, but I have went through all of the troubleshooting steps that I can find inside of the Windows guest machine. Removed drivers with DDU, installed earlier versions, ran several system commands to fix corrupt files, swapped the GPU for an RTX-2070 Super, disabled fast start, disabled any power management things related to PCIe, and probably a few other things that I am forgetting.
There is no evidence inside of Windows that shows there is an issue until the BSOD. I've looked in the Event viewer, Device Manager Events on the GPUs themselves, and ran
dxdiag
which said everything was happy.The VM is used for remote gaming and doesn't crash all that often if I am running Minecraft Bedrock edition. However, I can get it to crash almost everytime if I open the launcher or the Java edition. It will crash if I try to uninstall either of those. I had to remove the GPU and install them without it, which worked, but I still got the BSOD when launching them without the GPU. It will also crash if I try to update the Nvidia driver with first removing it with DDU in safe mode. Didn't try to remove it in normal mode.
So, now I am wondering if something is up with my Proxmox configuration. I have went through the PCI and PCIe passthrough docs several times and I think it is set up just fine. I have forced the PCIe slot to be Gen3 x16 in the BIOS, which is what the card is, but the issue happens when it is set to auto or manually configured.
Is there anyone out there with a similar issue or that might have some input on how to troubleshoot further???
VM Config:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 12
cpu: host
efidisk0: tank_nvme:vm-102-disk-1,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:81:00,pcie=1
hostpci1: 0000:42:10.2,pcie=1
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1702680162
name: Win11-Pro
numa: 0
ostype: win11
scsi0: tank_nvme:vm-102-disk-0,cache=writeback,discard=on,size=102G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=809fb1c2-c1c0-4c32-801e-b8ec688f2d92
sockets: 1
startup: order=5,up=10
tpmstate0: tank_nvme:vm-102-disk-2,size=4M,version=v2.0
vga: std
vmgenid: 1ecfccf4-82aa-4ad6-9a2b-da75cb035881
#qmdump#map:efidisk0:drive-efidisk0:tank_nvme:raw:
#qmdump#map:scsi0:drive-scsi0:tank_nvme:raw:
#qmdump#map:tpmstate0:drive-tpmstate0-backup:tank_nvme:raw:
Last edited: