Urgent Help, Windows 10 Crash!!! BSOD VIDEO_DXGKRNL_FATAL_ERROR

Aug 7, 2023
26
2
3
Hi Folks,

I am getting VIDEO_DXGKRNL_FATAL_ERROR error when i do load test like Passmark
This is very random, 90% time this error happens with in the first test.

Note: If i switch to KVM64/QEMU 64 this crash does not happen strangely


My Hardware/Software

AMD 5900X
32GB DDR4 RAM @ 3200MHZ (XMP)
1TB SSD
2 X NVIDIA 1080
X570 GIGABYTE AORUS PRO (LATEST BIOS 2023)

Bios:

SR-IOV - ENABLED
4G DECODING - DISABLED
CSM - ENABLED

Host:
ZFS (with Arc Memory limit to 2GB)
5.15.108-1-pve
Proxmox 7.4-16


VM:
Windows 10 PRO (latest updated)


Here is my Config

agent: 1,type=virtio
balloon: 0
bios: ovmf
boot: order=sata0
cores: 8
cpu: host,hidden=1
cpuunits: 10000
efidisk0: local-zfs:vm-100-disk-0,size=1M
hostpci0: 0000:09:00,pcie=1
machine: pc-q35-7.2
memory: 14048
name: ogn1
net0: rtl8139=AA:DD:33:51:FF:52,bridge=vmbr0
numa: 0
ostype: win10
sata0: local-zfs:vm-100-disk-1,aio=threads,cache=writeback,discard=on,size=100G,snapshot=0,ssd=1
sockets: 1
vga: none
vmgenid: f345cc23-e6ad-486e-8783-c909092232c1


Things i have tried

1. Fresh Windows
2. Safe Boot DDU + STUDIO Driver
3. MSI Interrupt
4. Moved to SATA/RLTK instead of VFIO Drivers
5. kvm=off and -hypervisor in host


Nothing seems to work at all


Please help, need to run things in host mode for the best performance i need.
 
Last edited:
Hello,


Can you check if there is any new available version of the GPU driver on the Windows side?
 
could you upload the result of journalctl -b > journal.txt
 
@Moayad I used the latest studio driver, further, i have made a clean uninstall using DDU and tried latest game driver 538 and still the same

PFA the log

Update: Sometimes i randomly get host reboot or sometimes vm reboot with CRITICAL_PROCESS_DIED BSOD
 
Last edited:
Thank you for the logs!

Code:
Aug 07 15:27:53 myproxmoxserverkernel: vfio-pci 0000:09:00.0: No more image in the PCI ROM
Aug 07 15:27:55 myproxmoxserverkernel: vfio-pci 0000:09:00.0: No more image in the PCI ROM
Aug 07 15:27:55 myproxmoxserverkernel: vfio-pci 0000:09:00.0: No more image in the PCI ROM

The next step I would set up the ROM file [0] to the VM.

[0] https://pve.proxmox.com/wiki/PCI_Passthrough#The_.27romfile.27_option
 
I dont know it it related but there were also a lot of memory errors. Maybe it would be worth to double check your ram and memtest it
 
I encountered the exact same issue. My Windows 10 vm recently started experiencing random BSOD(VIDEO_DXGKRNL_FATAL_ERROR). It had been running smoothly for the past three weeks. I tried various methods, including reinstalling Windows 10, attempting Windows 11, and passing the VBIOS ROM file, but the problem persisted. It wasn't until I came across your post that I tried changing the CPU from "host" to "kvm64," and miraculously, the issue was resolved.

After numerous attempts, I discovered a method that consistently triggered the BSOD: exporting the GPU's VBIOS using GPU-Z. Every time I confirmed the export, the system would BSOD.

Furthermore, I observed an unusual behavior. GPU-Z would show the UEFI status as "enabled" only the first time it was opened. Upon closing and reopening GPU-Z, the UEFI status would become "disabled." Additionally, a few seconds after opening GPU-Z, the bus interface information would change from "PCIe x16 3.0" to "PCI."
open GPU-Z first time
gpu-z-bad.jpg
open GPU-Z again:
gpu-z-bad2.jpg

Switching the CPU from "host" to "kvm64" resolved the above issues. GPU-Z now correctly displays UEFI status and bus interface and won't change.

I hope this promblem can be fixed someday. Here is some of my system information for reference:

Hardware:
  • AMD 5800X
  • ASUS TUF B550
  • MSI GTX 1070
pve:

Code:
root@ryzen-pve ➜  ~ pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-12-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Kernel, module parameters:

Code:
root@ryzen-pve ➜  ~ cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off video=efifb:off video=simplefb:off
root@ryzen-pve ➜  ~ cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off video=efifb:off video=simplefb:off
root@ryzen-pve ➜  ~ cat /etc/modprobe.d/vifo.conf
options vfio-pci ids=10de:1b81,10de:10f0,10de:1c02,10de:10f1 disable_vga=1
options vfio_iommu_type1 allow_unsafe_interrupts=1
root@ryzen-pve ➜  ~ cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nvidia*
blacklist snd_hda_intel
blacklist nvidiafb
root@ryzen-pve ➜  ~ cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

VM Configuration:

YAML:
root@ryzen-pve ➜  ~ qm config 201
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-zfs:vm-201-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:08:00,pcie=1,romfile=MSI.GTX1070.8192.160520.rom
ide2: none,media=cdrom
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1694880276
name: kvm-win11
net0: virtio=7E:EE:45:82:75:0C,bridge=vmbr1,firewall=1
numa: 0
ostype: win11
scsi0: local-zfs:vm-201-disk-system,discard=on,iothread=1,size=150G
scsi1: local-zfs:vm-200-disk-life,discard=on,iothread=1,size=200G,ssd=1
scsi2: local-zfs:vm-200-disk-common,discard=on,iothread=1,size=400G,ssd=1
scsi3: pool0:vm-200-disk-simulator,discard=on,iothread=1,size=600G,ssd=1
scsi4: pool0:vm-200-disk-gamessd,discard=on,iothread=1,size=600G,ssd=1
scsi5: /dev/disk/by-id/wwn-0x5000c500dca9e8c1-part2,size=1615895M
scsihw: virtio-scsi-single
smbios1: uuid=4353a00c-0248-49fd-9b29-ef792829b5ac
sockets: 1
tpmstate0: local-zfs:vm-201-disk-2,size=4M,version=v2.0
vmgenid: 14636e76-53f0-4fbc-bcab-77dfd55e89cf
 
Last edited:
  • Like
Reactions: Cantalupo
Same here with AMD 5950X. No problem on a Intel 9th Gen CPU or KVM CPU. So the problem is somewhere with AMD CPUs?