Persistent VM instability with Ryzen 9 9950X3D and Proxmox 8/9

EventHorizon

New Member
Aug 19, 2025
15
1
3
Hi,

I’m running an ASUS ProArt X870E-Creator WiFi (BIOS 1605) with a Ryzen 9 9950X3D and 256 GB of RAM. My workflow requires spawning several VMs, but I’m seeing recurrent instability in guest VMs (both Windows and Linux): after a few hours they typically reboot or hang with what appear to be memory-related errors.

Hardware / memory tried

  • Crucial CP64G56C46U5 (64 GB modules), total 256 GB, currently running at 3600.
  • Corsair CMK192GX5M4B5200C38 (total 192 GB) — same behavior.
  • CPU swapped to Ryzen 9 9950Xsame behavior.
Firmware & settings

  • All firmware updated; motherboard BIOS is 1605.
  • 24 hours of memory testing reveal no erros.
  • Tried disabling Memory Context Restore and C-States; also tried leaving everything on Auto.
  • Issue reproduces on Proxmox VE 9 (and previously 8.4).
Despite these changes, the guest VMs remain unstable. The strange thing is that it's much worse with kernel 6.14 than it was with 6.8. With 6.8 these reboots happened after a few days, now with 6.14 are happening after a few hours.

Any ideas?
 
I have the same hardware, on CPU, chipset, and memory brand. What kept freezing me up, was any profile other than the default on my dimms - xmp or expo made my issues, even being well within published spec speed. Not sure if this is your problem as well.
 
I have left it all to the default setting in the bios. Have you any setting outside of the defaults (many are AUTO) in the BIOS? Do you use ZFS in any of your drives?
 
I have left it all to the default setting in the bios. Have you any setting outside of the defaults (many are AUTO) in the BIOS? Do you use ZFS in any of your drives?
Mostly all at auto with SVM enabled, and I don't have any ZFS drives. I also opted to use the x86-64-v4 cpu for my VMs, for what it's worth.
 
I now have a lead, if i run a stress test like AIDA64 in the VM, it doesn't reboot. If I stop, after a few hours, reboots. In the past I already switch off global C states, can it be something related?
 
Some boards actually have a dummy load to keep the "clever" power supplies from trying to go into auto power saving mode, when low loading occurs.

Look for "Power Loading" in your bios - it Enables or Disables dummy load. When the power supply is at low load, a self-protection will activate causing
it to shutdown or "fail", which when the bios is set to always power on after power loss, might look like a reboot. If this occurs, please set this option to Enabled.
 
Also, any chance you could run:
cat /proc/iomem | grep -i 'PCI Bus'
on your system and share the output for me?

I’m trying to estimate how much MMIO space the ProArt X870E firmware reserves for PCIe devices. I’m looking for boards that allocate at least 1GB of MMIO address space, ideally more, to support dual GPU passthrough without hitting BAR assignment issues. Thanks!
 
Here you go, I have this motherboard with a 9950X3D

root@proxmox:~# cat /proc/iomem | grep -i 'PCI Bus'
000a0000-000dffff : PCI Bus 0000:00
70000000-dfffffff : PCI Bus 0000:00
70000000-710fffff : PCI Bus 0000:04
70000000-710fffff : PCI Bus 0000:05
70000000-700fffff : PCI Bus 0000:06
70100000-70efffff : PCI Bus 0000:07
70100000-70efffff : PCI Bus 0000:08
70100000-701fffff : PCI Bus 0000:09
70200000-707fffff : PCI Bus 0000:0c
70800000-70afffff : PCI Bus 0000:0a
70b00000-70cfffff : PCI Bus 0000:0b
70d00000-70dfffff : PCI Bus 0000:0e
70e00000-70efffff : PCI Bus 0000:0f
70f00000-70ffffff : PCI Bus 0000:10
71000000-710fffff : PCI Bus 0000:11
a4000000-d47fffff : PCI Bus 0000:12
a4000000-d47fffff : PCI Bus 0000:13
a4000000-bbffffff : PCI Bus 0000:18
bc000000-d3ffffff : PCI Bus 0000:48
d4000000-d43fffff : PCI Bus 0000:78
d4400000-d47fffff : PCI Bus 0000:79
d8000000-dc0fffff : PCI Bus 0000:01
dd800000-dddfffff : PCI Bus 0000:7a
dde00000-ddefffff : PCI Bus 0000:7b
ddf00000-ddffffff : PCI Bus 0000:03
de000000-de0fffff : PCI Bus 0000:02
1090000000-ffffffffff : PCI Bus 0000:00
1200000000-17ffffffff : PCI Bus 0000:01
b800000000-f7ffffffff : PCI Bus 0000:12
b800000000-f7ffffffff : PCI Bus 0000:13
b800000000-d7ffffffff : PCI Bus 0000:18
d800000000-f7ffffffff : PCI Bus 0000:48
fc10000000-fc1fffffff : PCI Bus 0000:7a
 
  • Like
Reactions: a5cent
Wow. Fantastic! Thank you.

Not quite sure how to express how much that blows me away. The total MMIO space on the predecessor, the ProArt X670, wasn't even 1GB. Your ProArt X870 is offering up 100 times that amount! Promising indeed!

How has your experience been with Proxmox using this motherboard?
 
  • Like
Reactions: billeman
Not really, the latest test I did was with two 64Gb dimms instead of 4. Exactly the same issues. VMs, being it Linux or Windows, reboot after a few hours with memory erros when not in use. VLS=0, Global C states off, Typical Idle Power, No Dram power down, all the settings that could explain this. I'm running out of ideas, next steps will be system reinstall and after that PSU.

@billeman: can you share your setup information? Memory, settings, etc?

Thanks
 
  • Like
Reactions: tall1oN
Wow. Fantastic! Thank you.

Not quite sure how to express how much that blows me away. The total MMIO space on the predecessor, the ProArt X670, wasn't even 1GB. Your ProArt X870 is offering up 100 times that amount! Promising indeed!

How has your experience been with Proxmox using this motherboard?
Overall, I am very happy with this motherboard, but mostly have been running in on win11. I really really like the flexibility in the PCIE GEN5 bus (GPU GEN5 x8 + additional NVME GEN5 x4 and an free PCIE GEN5 x8 slot (running at x4).

Now typing this on a proxmox win11 VM with an RTX 5080 passed through, works great but for a proxmox n00b it's quite 'interesting' learning curve (but I have 30+ work experience).

Can't say I'm running it 24/7 though. too early. I've noticed that my VM locks up with "cpu : host" when running shadow of the tomb raider benchmark so I toned it down to x86-v3 and that work fine. It's pretty reproduceable. Works for me.

Cheers.
 
  • Like
Reactions: a5cent
@billeman: can you share your setup information? Memory, settings, etc?

Thanks

Hi, I'm running 2 sticks of 32 GB of RAM from G.skill 6000 Mhz, running it with EXPO + 'Hynix 6000 mobo custom settings' , it's CL32 but I run it at CL30. So it's quite tight. I've been testing this on windows for several months and it's actually stable at 6200 with these settings but I usually go down a little to be sure.

I will never ever run 4 sticks again after the instability with my 5950x on AM4......
 
@EventHorizon
PSU problems are the worst. I had that once on a Windows 11 PC. I got bluescreens followed by a reboot at random intervals. Sometimes worked for days. Sometimes only hours. RAM was also my first suspicion, as there was nothing in the logs that was useful for diagnostic purposes. A pain in the rear. PSU is certainly a possible culprit.

I hope you get it sorted.
 
Last edited:
Hi, I'm running 2 sticks of 32 GB of RAM from G.skill 6000 Mhz, running it with EXPO + 'Hynix 6000 mobo custom settings' , it's CL32 but I run it at CL30. So it's quite tight. I've been testing this on windows for several months and it's actually stable at 6200 with these settings but I usually go down a little to be sure.

I will never ever run 4 sticks again after the instability with my 5950x on AM4......

Thanks, I'm running with 192Gb and 256Gb, and I cant seem to find clear reports of anyone with this size of rams, running it stable....
 
Thanks, I'm running with 192Gb and 256Gb, and I cant seem to find clear reports of anyone with this size of rams, running it stable....

Have you tried running with two sticks ?
What are your voltages for Memory, VSOC, VDDP etc ?
If your DIMMS support EXPO, try enabling it , but only while running two dimms.
 
Have you tried running with two sticks ?
What are your voltages for Memory, VSOC, VDDP etc ?
If your DIMMS support EXPO, try enabling it , but only while running two dimms.
Yep 2 dimms, also tried underclock to 3200, mem volt between 1.1 and 1.2, VSOC 1.25, etc.... There's something here fundamentally wrong and probably low level.
 
The question is if this is caused by proxmox (configuration) or not. I suppose you tested with a native OS on the same system ?
 
Try to run your RAM @3600 with the voltage the DOCP-Profile intends (1.25V?). A sign of memcontroller being rough at the edge is when your initial mem-training (after CMOS-Reset) takes over 15 minutes.
If that does not help, try 2 modules in productive usage 2-3 days...maybe to be safe initially @3600 and later you can also try DOCP-Profile. If that fails also, you can be 99% sure it's not the RAM.
Then try CPU ASPM off for 2-3 days, some PCIE-cards don't like that too aggressive. If that works, you can try L0-only.