Hi,
I am having the following issue, if using a Ryzen 7950x3d or 7800x3d on a b650e or x670e motherboard I get the same behaviour.
As long as I use the cpu type "host" I get bluescreens on my Windows 11 Pro VM when I open the device manager and "scan for hardware changes" sometimes Windows does it by itself when installing specific software or drivers and I get the bluescreens.
If I use the cpu type "x86-x64-v4" I can do whatever I want there will be no crash.
I updated the grub with this line "GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt", since those should help with CPU reset bugs, I even tried changing bios boot type to "CSM from UEFI", they both have the same effect, they make the VM work stable if I scan for hardware changes.
But only until I start a software like Passmark for CPU benchmarks or MSI Afterburner, basically anything that accesses the CPU sensors, as soon as those start and they show CPU temperature "0 degrees", if I then scan again for hardware changes I get the bluescreen, even if I close the software first. I noticed that if I use the CPU type "x86-64-v4" when I start any of those software for CPU temp it will just show "N/A" instead of the temperature and then there will be no crashes on hardware scanning. I tried a lot of things but could not get a stable VM with the CPU type "host" with those 2 CPU types, does anyone have any suggestion? I would very much appreciate any advice!
Have all the latest drivers installed, latest Proxmox version with latest kernel 6.8.12-8, and also amd-microcode installed.
maybe this helps, those are the flags of my CPU:
root@prox:/etc/default# cat /proc/cpuinfo | grep flags | head -n 1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Updated to kernel 6.11.11-1 - problem still there.
Best Regards,
Dean
Hey Dean
Did you have more success?
I went from a
Ryzen 5 3600 + B450 + 64GB DDR4
to
Ryzen 9 7900 + X670E + 96GB DDR5
-
plus Nvidia 4060ti PCi-E passthrough.
That is where my issues started. However a bit different.
If I run the resources in a Debian/FreeBSD VM all is fine even when I stress burn it to the ground.
Windows 10 VM used to work flawlessly with any game.
Normal setup, nothing fancy except for CPU as HOST.
If I run a stress on it with the usual suspects (prime + furmark) it can burn for hours no issue.
Running the CPU and GPU in my ML VM also just works.
Now when I game on Windows for a bit it resets my whole machine.
I found an interesting post about watchdog timers, sadly my motherboard (MSI X670E-GAMING-PLUS-WIFI) does not have the option to disable it.
Here are some breadcrumbs for anyone debugging random reboot issues on Proxmox 8.3.1 or later.
tl:dr; If you're experiencing random unpredictable reboots on a Proxmox rig, try DISABLING (not leaving at Auto) your Core Watchdog Timer in the BIOS.
I have built a Proxmox 8.3 rig with the following specs:
- CPU: AMD Ryzen 9 7950X3D 4.2 GHz 16-Core Processor
- CPU Cooler: Noctua NH-D15 82.5 CFM CPU Cooler
- Motherboard: ASRock X670E Taichi Carrara EATX AM5 Motherboard
- Memory: 2 x G.Skill Trident Z5 Neo 64 GB (2 x 32 GB) DDR5-6000 CL30 Memory
- Storage: 4 x Samsung 990 Pro 4 TB...
To diagnose the problem I tried it all.
Memtest86 with and without XMP.
BIOS versions up to the latest and back until my board was even released.
Windows repair + updates (thanks snapshots)
Previous kernel pinning etc.
I don't game much, thus the reason it exists.
Playing Satisfactory it all works great except the resets.
Then I started to think what could it be in the VM, since it has access directly with the host CPU and GPU.
Then as one final test I stopped Steam and got the game in another manner to test.
Issue gone. So it could have been something in Steams and cheat etc that was causing the lockup and reset.
Hope some of this helps.
Cheers
Carl
Sofware/Server details:
pveperf
CPU BOGOMIPS: 177602.16
REGEX/SECOND: 2423057
HD SIZE: 642.33 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 391.94
DNS EXT: 173.69 ms
DNS INT: 33.43 ms
cat /proc/cpuinfo | grep flags | head -n 1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
pveversion --verbose
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.5 (running version: 8.3.5/dac3aa88bac3f300)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.1
libpve-rs-perl: 0.9.2
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.3-1
proxmox-backup-file-restore: 3.3.3-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.6
pve-cluster: 8.0.10
pve-container: 5.2.4
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.0
pve-qemu-kvm: 9.2.0-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1