Spontaneous reboots on Minisforum MS-A2 with 6.17 (and later 6.14)

VivienM · Feb 23, 2026

Hi,

This is a weird one. I have a Minisforum MS-A2, Ryzen 9955HX, 128GB of RAM, a Samsung SSD. Running up to kernels 6.14.8-2, it is rock solid. So I don't think it's a hardware issue...

Newer kernels, certainly including all the 6.17s I've tried including now 6.17.9-1 but I believe also including some newer 6.14s, cause spontaneous reboots within 24 hours.

I had "solved" this before Christmas by just going back to 6.14.8-2, but had a little power mishap yesterday, it booted back up to 6.17.9-1, and... less than 24 hours later, spontaneous reboot.

In the dmesg output, I note the following:

[ 0.892726] x86/amd: Previous system reset reason [0x00300800]: software wrote 0xE to reset control register 0xC
F9
[ 0.892728] x86/amd: Previous system reset reason [0x00300800]: ACPI power state transition occurred

I poked around journalctl, I'm not seeing any log entries that are particularly pertinent...

Happy to provide any further logs, etc.

BobhWasatch · Feb 24, 2026

Googling "software wrote 0xE to reset control register 0xC" leads to some interesting info.

VivienM · Feb 24, 2026

BobhWasatch said:
Googling "software wrote 0xE to reset control register 0xC" leads to some interesting info.

I didn't find that much, but I did discover that that message is cut off. Should be "software wrote 0xE to reset control register 0xCF9"

When you google that, yes, it starts to get more interesting, but most of what I'm finding so far is about instability issues with older Zen chips back in 2017-18...

BobhWasatch · Feb 24, 2026

The stuff about setting a slightly higher voltage and/or lower frequency in BIOS seems relevant though. It might also pay to look at what c-states are enabled and whether you have the AMD microcode installed.

Other than that I got nuthin'.

VivienM · Feb 24, 2026

BobhWasatch said:
The stuff about setting a slightly higher voltage and/or lower frequency in BIOS seems relevant though. It might also pay to look at what c-states are enabled and whether you have the AMD microcode installed.

Other than that I got nuthin'.

AMD microcode is installed.

I guess I can find a keyboard/monitor to go poke at the BIOS, but if those things are set wrong, why doesn't 6.14.8-2 have a problem with it?

Found something else while googling, someone having similar issues in ArchLinux that seemed to have to do with kernels being compiled with GCC 15.2. I wonder what GCC is used to compile which proxmox kernels...

VivienM · Feb 26, 2026

I got cautiously excited when I discovered my Samsung SSD firmware was behind, but... updated that, same issue.

For now, I've just pinned 6.14.8-2. Unless someone has some ideas, I think I'll revisit it when proxmox releases 7.0 kernels...

VivienM · Apr 21, 2026

Well, I tried 7.0.0-2... and... same issue. Same dmesg entry:

Code:

[    0.828655] x86/amd: Previous system reset reason [0x00300800]: software wrote 0xE to reset control register 0xCF9
[    0.828656] x86/amd: Previous system reset reason [0x00300800]: ACPI power state transition occurred

I need to get to the bottom of this, I can't be stuck at 6.14.8-2 forever...

prinskarnatie · Apr 23, 2026

I've got 3 of the same units running on 6.17.13-2 on BIOS 1.02 with no issues. On a controlled restart I get the following log entry, note the hex is different:

0.790135] x86/amd: Previous system reset reason [0x00080800]: software wrote 0x6 to reset control register 0xCF9

I've got the following customization

Grub - to suppress PCIE bus warning spam
GRUB_CMDLINE_LINUX_DEFAULT="quiet pci=noaer"

For the X710 i40e nic, I changed the VLAN default range of 2-4094 to 2-50 otherwise I got these errors spamming the log (although it did not affect nic behaviour). Forum post https://forum.proxmox.com/threads/e...rcing-overflow-promiscuous-on-pf.62875/page-3

Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
Error LIBIE_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on

I made these BIOS changes as per https://etcwiki.org/wiki/Minisforum_MS-A2_9955HX_temperature_fix

Advanced->AMD Overclocking-> Accept->Precision Boost Overdrive
CPU Boost Clock Override: Enabled(Negative)
Max CPU Boost Clock Override(-): 500
TJMAX 78

All my NVME slots are set to PCI3.0 x4.

VivienM · Apr 24, 2026

This is getting weirder. Before your reply, I figured there was a chance there was a kernel panic-type situation that wasn't being logged, so I figured I would hook up a monitor and set the kernel to panic=0. And... 23 hours later, no reboot so far. Which is the longest I've ever had 6.17/7.0 running for...

A watched server never crashes, I guess. If/when it does crash I will try some of your BIOS settings...

David Herselman · Apr 25, 2026

I have similar systems, where some were unstable after upgrading PVE 8 to 9. The key difference between them was that the AMD microcode and other firmware was considerably older. Downgrading to proxmox-kernel-6.14.8-2-pve-signed yielded fewer crashes (3-4 a day as opposed to every 10-40 minutes on proxmox-kernel-6.17.13-3-pve-signed) we fixed the issue by updating the BIOS.

Systems are Lenovo ThinkCentre M715q systems with `cat /proc/cpuinfo` reporting:
AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics

My systems were simply locking up and not restarting with the default softdog kernel module. I however got them to use the hardware TCO by updating /etc/default/pve-ha-manager to contain:

Code:

WATCHDOG_MODULE=sp5100_tco

Validation:

Code:

[admin@kvm2c ~]# wdctl
Device:        /dev/watchdog0
Identity:      SP5100 TCO timer [version 0]
Timeout:       10 seconds
Timeleft:      10 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

Before upgrade:

Code:

  lshw | less
    version: M1XKT34A
    date: 09/04/2018
  AMD microcode in BIOS:
    [root@kvm2d ~]# journalctl -n 10000 | grep microcode
    Apr 25 11:22:49 kvm2d kernel: microcode: Current revision: 0x08101007

After upgrade:

Code:

  lshw | less
    version: M1XKT63A
    date: 04/11/2024
  AMD microcode in BIOS:
    [root@kvm2c ~]# journalctl -n 10000 | grep microcode
    Apr 25 12:42:15 kvm1 kernel: microcode: Current revision: 0x0810100b

PS: Installing the amd64-microcode package didn't help, it apparently can't help with certain parts initialising before the kernel boots.

Google Gemini summarised the difference in the BIOS updates to be:

M1XKT63A replaces the experimental early-Zen power management logic (M1XKT34A) with the industry-standard stable AGESA 1.2.x, resolving critical CPU-idle hangs and providing hardware-level mitigations for Zen-architecture vulnerabilities.

1. AGESA & CPU Stability

The most critical change is the AGESA (AMD Generic Encapsulated Software Architecture).

The 2018 version was written when the Raven Ridge architecture was brand new. It had aggressive power-saving bugs that caused the CPU to drop voltage too low during idle transitions, which is what caused your Proxmox "Hard Freezes."
The 2024 version is the "refined" logic. It ensures that even during deep idle (C-states), the CPU maintains a stable floor voltage.

2. Microcode & "Zenbleed"

The microcode revision 0x0810100b in the new BIOS is the official fix for several silicon-level bugs.

Speculative Execution: The 2018 version was vulnerable to several side-channel attacks that could crash a kernel under specific branch-prediction loads.
The fix: The 2024 microcode fundamentally changes how the CPU handles certain "Move" instructions, making it significantly more robust for virtualization (KVM/Proxmox) environments.

3. ACPI Tables (The Kernel Handshake)

When Linux boots, it reads the ACPI Tables from the BIOS to learn how to manage the hardware.

Old BIOS: Included messy tables that didn't strictly follow UEFI standards, often leading to "Spurious Interrupt" or "IOMMU" errors in the Linux dmesg.
New BIOS: Features cleaned-up tables that match modern Linux kernel expectations. This is why you no longer need complex "boot parameters" (like idle=nomwait) to keep the system stable.

4. Security (LogoFAIL)

The 2024 update specifically addresses LogoFAIL.

The 2018 BIOS had a vulnerability where a malicious image file used as a boot logo could execute code at the firmware level.
The 2024 version (the one you are using to flash your custom logos) has a hardened image parser to prevent this.

VivienM · Apr 26, 2026

prinskarnatie said:
I made these BIOS changes as per https://etcwiki.org/wiki/Minisforum_MS-A2_9955HX_temperature_fix

Advanced->AMD Overclocking-> Accept->Precision Boost Overdrive
CPU Boost Clock Override: Enabled(Negative)
Max CPU Boost Clock Override(-): 500
TJMAX 78

All my NVME slots are set to PCI3.0 x4.

Tried the BIOS changes, it didn't help, spontaneous reboot after 24ish hours...

Now trying with my NVMe SSD set to PCI 3.0 instead of 4.0.

prinskarnatie · Apr 26, 2026

Does journalctl -b -1 -e after the spontaneous reboot give anything of interest? Google says the 0xE code indicates a power cycle trigger by software. Which watchdog are you using in Proxmox? I'm using softdog. I started looking at sp5100_tco but haven't implemented it yet.

My main challenge has always been system temps, which was the reason for the boost down clock and running NVME's as PCI 3.0 x4. I added an Intel E800 nic card which required me to mount external fans on the case to keep temps in check.

VivienM · Apr 26, 2026

prinskarnatie said:
Does journalctl -b -1 -e after the spontaneous reboot give anything of interest? Google says the 0xE code indicates a power cycle trigger by software. Which watchdog are you using in Proxmox? I'm using softdog. I started looking at sp5100_tco but haven't implemented it yet.

My main challenge has always been system temps, which was the reason for the boost down clock and running NVME's as PCI 3.0 x4. I added an Intel E800 nic card which required me to mount external fans on the case to keep temps in check.

Nothing of interest.

I don't know what watchdog I'm using, presumably.. whatever the default is?

I did also set up kdump, let's see if that turns up anything useful... as soon as I can figure out how to access it...

VivienM · Apr 28, 2026

So, it seems to be taking longer to reboot, which is... in a way, the opposite of what you want for troubleshooting. But after like 36 hours it rebooted. kdump + panic=0 turned up nothing so I don't think it's a kernel panic.

And yet, if you go back to the software wrote 0xE to reset control register 0xCF9 message, doesn't that suggest some kind of software is triggering a reboot?
If, say, the SSD fell off the motherboard, wouldn't that trigger a kernel panic that would have stayed on the screen with panic=0?

Time to do more googling...

VivienM · Monday at 01:18

I ended up going back to 6.14.8, spontaneous reboots are gone, all back happy again.

If, as I believe, this issue started between 6.14.8 and 6.14.11 (which I'm not entirely sure about, I think I only tried 6.14.11 once), shouldn't it be possible to get to the bottom of it? There can't be that many applicable changes between those two kernel versions?

uzumo · Monday at 01:41

VivienM · Monday at 01:55

Thanks - you've pointed me to exactly the thing I was missing, i.e. where to find the actual changes between the ubuntu kernels that Proxmox kernels are based on.

The thing is - I'm not particularly looking forward to spending hours poring through changelogs, etc, but do I have any other choice? The fact that whatever is going on is still going on in 7.0.0 suggests that it hasn't really affected enough other people for someone else to have found a fix...

Spontaneous reboots on Minisforum MS-A2 with 6.17 (and later 6.14)

Member

Distinguished Member

Member

Distinguished Member

Member

Member

Member

New Member

Member

Renowned Member

​

1. AGESA & CPU Stability​

​

2. Microcode & "Zenbleed"​

​

3. ACPI Tables (The Kernel Handshake)​

​

4. Security (LogoFAIL)​

Member

New Member

Member

Member

Member

Well-Known Member

Member

We value your privacy

1. AGESA & CPU Stability

2. Microcode & "Zenbleed"

3. ACPI Tables (The Kernel Handshake)

4. Security (LogoFAIL)