Bluescreens with AMD 7000 series x3d CPUs with cpu type "host"

ddean

New Member
Mar 1, 2024
6
0
1
Hi dear Proxmox community,

I am having the following issue, if using a Ryzen 7950x3d or 7800x3d on a b650e or x670e motherboard I get the same behaviour in both environments.

As long as I use the cpu type "host" I get bluescreens on my Windows 11 Pro VM when I open the device manager and "scan for hardware changes" sometimes Windows does it by itself when installing specific software or drivers and I get the bluescreens.

If I use the cpu type "x86-x64-v4" I can do whatever I want there will be no crash.

I updated the grub with this line "GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt", since those should help with CPU reset bugs, I even tried changing bios boot type to "CSM from UEFI", they both have the same effect, they make the VM work stable if I scan for hardware changes but not forever...

If I start a software like Passmark for CPU benchmarks or MSI Afterburner, basically anything that accesses the CPU sensors, as soon as those start and they show CPU temperature "0 degrees", if I then scan again for hardware changes I get the bluescreen, even if I close the software first. I noticed that if I use the CPU type "x86-64-v4" when I start any of those software for CPU temp it will just show "N/A" instead of the temperature and then there will be no crashes on hardware scanning. I tried a lot of things but could not get a stable VM with the CPU type "host" with those 2 CPU types, does anyone have any suggestion? I would very much appreciate any advice!

Have all the latest drivers installed, latest Proxmox version with latest kernel 6.8.12-8, and also amd-microcode installed.

maybe this helps, those are the flags of my CPU:

root@prox:/etc/default# cat /proc/cpuinfo | grep flags | head -n 1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d

Updated to kernel 6.11.11-1 - problem still there.

Best Regards,
Dean
 
Hello ddean! What are you trying to achieve, exactly? What drivers or software are you trying to install? Do you want to do PCI / PCIe passthrough? Do you have a reason why you want to use the host CPU type at all costs, and not x86-64-v4?

Please also provide us with the full VM configuration (output of qm config <VMID>).

Last but not least, newer BIOS versions often improve IOMMU support, so I can also recommend installing the latest BIOS update for both motherboards. Don't forget that the BIOS settings will probably get lost after updating.

Last but not least, note that the PVE documentation on IOMMU states the following:
With AMD CPUs IOMMU is enabled by default. With recent kernels (6.8 or newer),this is also true for Intel CPUs.

While the amd_iommu kernel parameter exists, on is not a valid parameter as with intel_iommu, so amd_iommu=on has no effect. In other words, as long as you enable it in the BIOS, the kernel will detect and enable it automatically. This can also be checked by executing dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
 
Last edited:
Hello Laurentiu!

Thank you very much for the information, I have basically created for a friend who wants to share his workstation with his wife a PC with 2 x nvidia GPUs and I created 2x Gaming VMs on it, each one with passthrough of a GPU, full performance is only achieved when using cpu type "host" but I can live with the "x86-64-v4" until the problem will probably be fixed some day, CPU Benchmarks show "host" is a bit more better.

For my tests I have setup the VMs completely new, without any PCIe passthrough, the bluescreen issue still appears, this happens if for example you install any new driver (can be an external HDD, usb stick, monitor drivers for the monitor software, the nvidia driver after I make the pcie passthrough) basically anything that affects the hardware in the device manager. This issue as I said appears also without havin any device in passthrough, not USB, not pcie, and it only happens afters starting a software which reads data from the CPU, like thermal sensors, cpu speed etc.

I will post the qm config later, it is not very complex, I have updated BIOS to latest version, the strange thing is, there are a few games which if I start with the "x86-64-v4" will minimize at start and no matter how often you try they will not maximize but always minimize back. If I use "host" they work correctly.

p.s. I have also noticed If I use amd_iommu=on iommu=pt I can "scan for hardware changes" until I start a software reading cpu values, if I do not use those values in grub, scanning for hardware changes will always crash the vm even without starting any kind of software. The problem is by running msi afterburner or any software like that in the background on windows start, it is impossible to install any driver.

There are no bluescreen dumps or error logs, the only thing I was able to see on the screen once was something with "video dxgkrnl" I would think GPU driver, but this also happens on fresh installed VM without any gpu attached only "standard vga" virtual display or virtio display.

BR,
Dean
 
I will post the qm config later
Sure, then we can maybe help a bit more :)

p.s. I have also noticed If I use amd_iommu=on iommu=pt I can "scan for hardware changes" until I start a software reading cpu values, if I do not use those values in grub, scanning for hardware changes will always crash the vm even without starting any kind of software.
Just to be precise, iommu=pt is a correct parameter, enabling IOMMU passthrough, as explained by the PVE docs. Only amd_iommu=on is not valid (as previously explained, amd_iommu exists, and the option on exists for intel_iommu, but the combination of amd_iommu=on is not correct, so the parameter is ignored).

Just to be sure, have you checked the wiki article on Windows 11 guest best practices? I highly recommend reading everything carefully. Use the recommended VM settings, install the VirtIO drivers and the QEMU Guest Agent and you should be ready to go. I mention this because you were talking about installing drivers, and unless you do any passthrough, the VirtIO ISO should contain all the drivers you need. Of course, when doing GPU passthrough, you'll also need to install the GPU driver in the guest VM.
 
Thank you for the clarification!

I have installed all virtio drivers, and guest tools, followed all the best practices, someone mentioned on a forum that you will get reset bug on X3D CPUs if you do not set your BIOS to CSM instead of UEFI, this helps but does not solve the issue completely, the behaviour is the same as if I use iommu=pt, so currently no real reason to keep it on CSM.

The issue occurs even without attaching the dedicated GPU, I will boot up the system tonight and attach the config file.

Bluescreen happens as soon as Windows scans for hardware changes, even without any passthrough of any device, I can try reproducing the issue in a short video or gladly offer a live session for 5 minutes tomorrow if you think you or anyone would have time to just take a short look.

I would gladly order a Pizza or more to anyone who can help me with this :))
 
Sure, then we can maybe help a bit more :)


Just to be precise, iommu=pt is a correct parameter, enabling IOMMU passthrough, as explained by the PVE docs. Only amd_iommu=on is not valid (as previously explained, amd_iommu exists, and the option on exists for intel_iommu, but the combination of amd_iommu=on is not correct, so the parameter is ignored).

Just to be sure, have you checked the wiki article on Windows 11 guest best practices? I highly recommend reading everything carefully. Use the recommended VM settings, install the VirtIO drivers and the QEMU Guest Agent and you should be ready to go. I mention this because you were talking about installing drivers, and unless you do any passthrough, the VirtIO ISO should contain all the drivers you need. Of course, when doing GPU passthrough, you'll also need to install the GPU driver in the guest VM.
This is my config with the GPU passthrough, but it makes no difference if it is enabled or if I disable it and use a virtual GPU:

acpi: 1
agent: 1
balloon: 0
bios: ovmf
boot: order=virtio0;net0
cores: 32
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:01:00,pcie=1,x-vga=1
ide0: local:iso/virtio-win-0.1.266.iso,media=cdrom,size=707456K
machine: pc-q35-9.0
memory: 32768
meta: creation-qemu=9.0.2,ctime=1739531596
name: GameVM
net0: virtio=BC:24:11:85:69:D6,bridge=vmbr0
numa: 1
onboot: 1
ostype: win11
scsihw: virtio-scsi-pci
smbios1: uuid=e8c58d5c-8f4c-4149-a91e-08d426876d69,manufacturer=TWljcm8tU3RhciBJbnRlcm5hdGlvbmFsIENvLiwgTHRkLg==,product=TVMtN0Q3MA==,version=MS4w,base64=1
sockets: 1
tpmstate0: local-lvm:vm-100-disk-1,size=4M,version=v2.0
usb0: host=3-3,usb3=1
usb1: host=5-2,usb3=1
usb2: host=3-7,usb3=1
vga: none
virtio0: local-lvm:vm-100-disk-2,backup=0,cache=unsafe,iothread=1,replicate=0,size=150G
vmgenid: 9bdaab4c-be55-436a-9e59-791e73d72a58
 
Just wondering: if you select host as CPU type when creating the VM and try to install Windows without changing the CPU type afterwards, do you still have issues? I'm just asking because Windows is a bit more fragile when it comes to changing hardware. Although changing the CPU type shouldn't break stuff, I was just wondering whether it maybe could still help, since we have no other hint on what's going wrong.

Otherwise I would highly recommend - if possible - to enable memory dumps in Windows so that we get some more information. Maybe this is a QEMU issue, but at this point we don't know, since we have no further information.

Otherwise, if you cannot find any further info, I simply recommend using x86-x64-v4 if it works and the performance is good enough.
 
Hi,

Thanks for the information, it does not help if I select "host" when creating the VM without changing it later, I was actually wondering why I keep getting the bluescreens until I found the workarounds with the other CPU type, it also is the only CPU type that works, if I choose AMD EPYC or anything from AMD, it will not work, I will try enabling memory dumps and get back to you.

Best Regards,
Dean