Ryzen 5950X: "Uncorrected, software containable error" events after upgrade to kernel 7.0.x

harrydus

New Member
May 7, 2025
5
3
3
Hello,

we are seeing a number of MCE/RAS events on several Proxmox VE 9.2 hosts after upgrading from the 6.17 kernel series to the new 7.0 kernel series and would like to know whether others are observing something similar.
  • Hetzner dedicated servers
  • AMD Ryzen 9 5950X
  • ZFS
  • Multiple independent hosts affected
  • Proxmox VE 9.2.x
  • Kernel: 7.0.6-2-pve

CPU: Model name: AMD Ryzen 9 5950X 16-Core Processor cpuid=0x00a20f10 microcode=0x0a20102e

Since upgrading to kernel 7.0.x we have seen MCE events on multiple hosts.

Example event:
2026-06-01 09:53:50 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=7), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffff8dad806f7b80, misc=0x10000000000000, walltime=0x6a1d56ae, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x00000007, microcode=0x0a20102e

Log:
Jun 01 09:53:50 proxmox02 kernel: mce: [Hardware Error]: Machine check events logged
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: System Fatal error.
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: CPU:1 (19:21:0) MC7_STATUS[Over|UE|MiscV|AddrV|PCC|SyndV|-|Poison|Scrub]: 0xffff8dad806f7b80
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: Error Addr: 0x0000000000000000
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: Bank 7 is reserved.
Jun 01 09:53:50 proxmox02 kernel: [Hardware Error]: cache level: RESV, tx: INSN

All affected systems (5 so far) have the same CPU model (Ryzen 5950X) using the same microcode version. We are running this machines for quite a while and hadn't this kind of errors with kernel 6.x

Did anyone experience similar issues?
 
  • Like
Reactions: TheNiceDave
Hi @harrydus

thanks for posting in the forum!

Can you please confirm that this behavior started after just changing the kernel version or did it start after the upgrade from PVE 8 to 9?
If latter, please try rolling back the kernel to 6.17 with no other software changes and confirm the issue doesn't persist.

Are you using a custom performance governor or CPU frequency scaler?
Please also provide the output of the following commands
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

Are these log events accompanied by any other symptoms / outages or is it currently mainly a cosmetic issue?

EDIT:
Also if available, please provide logs from additional events to compare the error messages / addresses.

Yours sincerely
Jonas
 
Last edited:
Hello,

yes I can confirm, that this kind of problems started after switching to kernel 7.0. We already switched back to 6.17 to test it and confirmed that with 6.17 this kind of problem didn't happen. However: Also with Kernel 6.17 there were MCE errors, but they were always correctable or deferred, so nothing as serious as a "Uncorrected, software containable error".

We don't use a custom performance governor or CPU frequency scaler.

The log events weren't accompanied by other symptoms, however we had some random reboots in the past on this and other machines at Hetzner which were not related to this errors.

Here are the MCE messages and contents of scaling_governor and scaling_driver of all affected hosts.
Note: Host 4 is the one on which we switched back to kernel 6.17. You can see that after event 3 2026-05-07 00:57:32 +0000 error: Uncorrected, software containable error. only 2 other noncritical errors of the type Corrected error, no action required did occur


Code:
# Host 1:
scaling_governor: performance
scaling_driver: amd-pstate-epp
MCE events:
1 2026-04-27 21:58:07 +0000 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=14), mcg mcgstatus=0, mca DCQ SRAM ECC error. Ext Err Code: 6, mcgcap=0
x0000011c, status=0x80000002d386d163, walltime=0x69efdbef, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x0000000e, microcode=0x0a20102e
2 2026-06-01 09:53:50 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=7), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffff8dad806f7b80, misc=0x10000000000000, walltime=0x6a1d56ae, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x00000007, microcode=0x0a20102e


# Host 2:
scaling_governor: performance
scaling_driver: amd-pstate-epp
MCE events:
1 2026-03-13 02:46:41 +0000 error: Deferred error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=25), mcg mcgstatus=0, mci Task_context_corrupt, mcgcap=0x0000011c, statu
s=0x9090909090909090, walltime=0x69b37a91, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x00000019, microcode=0x0a20102e
2 2026-05-30 05:09:12 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=13), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffff8ccac1108b20, misc=0x10000000000000, walltime=0x6a1a70f8, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x000
0000d, microcode=0x0a20102e


# Host 3:
scaling_governor: performance
scaling_driver: amd-pstate-epp
MCE events:
1 2026-04-24 06:41:23 +0000 error: Deferred error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=20), mcg mcgstatus=0, mci UECC Poison consumed, mcgcap=0x0000011c, statu
s=0x9066ffd8cf9de9ff, walltime=0x69eb1093, cpu=0x00000001, cpuid=0x00a20f12, apicid=0x00000002, bank=0x00000014, microcode=0x0a201211
2 2026-05-06 09:07:52 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=15), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
CECC Task_context_corrupt, mcgcap=0x0000011c, status=0xffffd3dcc06c8b10, misc=0x10000000000000, walltime=0x69fb04e8, cpu=0x00000001, cpuid=0x00a20f12, apicid=0x00000002, bank=0x0000000f, micr
ocode=0x0a201211

# Host 4:
scaling_governor: performance
scaling_driver: amd-pstate-epp
MCE events:
1 2026-03-29 22:49:01 +0000 error: Deferred error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=21), mcg mcgstatus=0, mci Task_context_corrupt, mcgcap=0x0000011c, statu
s=0x9090909090909090, walltime=0x69c9ac5d, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x00000015, microcode=0x0a20102e
2 2026-04-30 06:07:38 +0000 error: Deferred error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=16), mcg mcgstatus=0, mci Error_overflow CECC, mcgcap=0x0000011c, status
=0xc931d231c0315d5c, misc=0x10000000000000, walltime=0x69f2f1aa, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x00000010, microcode=0x0a20102e
3 2026-05-07 00:57:32 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=25), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
UECC Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffffffffa514b640, misc=0x10000000000000, walltime=0x69fbe37c, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=
0x00000019, microcode=0x0a20102e
4 2026-05-26 13:53:19 +0000 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=26), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt Task
_context_corrupt, mca DRAM ECC error. Ext Err Code: 0 Memory Error 'mem-tx: generic write, tx: instruction, level: L2', memory_channel=0,csrow=0, mcgcap=0x0000011c, status=0xdead000000000122,
 misc=0x10000000000000, walltime=0x6a15a5cf, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x0000001a, microcode=0x0a20102e
5 2026-05-29 14:36:55 +0000 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller V2 (bank=21), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt Task
_context_corrupt, mca DRAM ECC error. Ext Err Code: 0, memory_channel=0,csrow=0, mcgcap=0x0000011c, status=0xdead000000000300, misc=0x10000000000000, walltime=0x6a19a487, cpu=0x00000001, cpui
d=0x00a20f10, apicid=0x00000002, bank=0x00000015, microcode=0x0a20102e

# Host 5:
scaling_governor: performance
scaling_driver: amd-pstate-epp
MCE events:
1 2026-05-11 19:40:54 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=13), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
UECC Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffffffff8620b000, misc=0x10000000000000, walltime=0x6a0230c6, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=
0x0000000d, microcode=0x0a20102e
2 2026-05-13 09:38:16 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=17), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
UECC Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffffffff8620b000, misc=0x10000000000000, walltime=0x6a044688, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=
0x00000011, microcode=0x0a20102e
3 2026-05-29 14:02:40 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=9), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt P
oison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffff8eba6eb1ca28, misc=0x10000000000000, walltime=0x6a199c80, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=0x0000
0009, microcode=0x0a20102e
4 2026-05-31 14:11:24 +0000 error: Uncorrected, software containable error., CPU 2, bank Unified Memory Controller V2 (bank=10), mcg mcgstatus=0, mci Error_overflow Processor_context_corrupt
UECC Poison consumed Task_context_corrupt, mcgcap=0x0000011c, status=0xffffffff9f60b000, misc=0x10000000000000, walltime=0x6a1c418c, cpu=0x00000001, cpuid=0x00a20f10, apicid=0x00000002, bank=
0x0000000a, microcode=0x0a20102e

As a side note: We already communicated with Hetzner support, but they won't do anything about this because
"Proxmox is not something we support or guarantee on our server models"
 
Last edited:
My 5950X with microcode 0x0a201213 works fine with all the recent 7.0 Proxmox kernels. That in itself is not useful to you but maybe you can do (or request) a BIOS update?

EDIT: I also have the 3.20251202.1~bpo13+1 version of the amd64-microcode package installed. Do your systems have it installed?
 
Last edited:
My 5950X with microcode 0x0a201213 works fine with all the recent 7.0 Proxmox kernels. That in itself is not useful to you but maybe you can do (or request) a BIOS update?

EDIT: I also have the 3.20251202.1~bpo13+1 version of the amd64-microcode package installed. Do your systems have it installed?
Yes, we have 3.20251202.1~bpo13+1 of amd64-microcode installed. I'm very skeptical that Hetzner support will do a BIOS update just for us, since their policy is "We don't officially support Proxmox".
 
No errors here on a 5950x.

Code:
amd64-microcode is already the newest version (3.20251202.1~bpo13+1)

Motherboard is an Asrock x570m pro4 with bios P5.65
Memory is M391A4G4AB1-CWE
 
I'm very skeptical that Hetzner support will do a BIOS update just for us, since their policy is "We don't officially support Proxmox".
But is sounds like a Linux kernel, is mostly Ubuntu 26.04(?), compatibility issue with the BIOS/firmware. Do they support Ubuntu? Would it maybe be worth your trouble to install Ubuntu on one server instead of Proxmox? I think a request for an up-to-date BIOS is not unreasonable (regardless of operating system) with the recent firmware security fixes in AMD's AGESA.

Motherboard is an Asrock x570m pro4 with bios P5.65
This is AGESA Combo V2 PI 1.2.0.E and I have AGESA ComboV2 1.2.0.F. @harrydus, what AGESA version does the Hetzner BIOS currently have? I think the amd64-microcode package from Debian is typically older.
 
Last edited: