Ryzen 5800X with Gigabyte X570 UD MCE Errors with 2 different CPUs?

coolspot

New Member
Dec 31, 2021
16
1
3
44
Hi all,

I installed two different Ryzen 5800X CPUs and Proxmox seems to throw a very similar MCE error after about 20 hours of uptime:

Original CPU

Jan 1 21:51:58 pve kernel: [88101.916026] mce: [Hardware Error]: Machine check events logged Jan 1 21:51:58 pve kernel: [88101.916030] [Hardware Error]: Corrected error, no action required. Jan 1 21:51:58 pve kernel: [88101.916053] [Hardware Error]: CPU:1 (19:21:0) MC15_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000100d9f163 Jan 1 21:51:58 pve kernel: [88101.916076] [Hardware Error]: IPID: 0x0000000000000000 Jan 1 21:51:58 pve kernel: [88101.916088] [Hardware Error]: Microprocessor 5 Unit Ext. Error Code: 25 Jan 1 21:51:58 pve kernel: [88101.916089] [Hardware Error]: cache level: L3/GEN, tx: INSN


Replacement CPU 2
[74403.734053] mce: [Hardware Error]: Machine check events logged [74403.734057] [Hardware Error]: Corrected error, no action required. [74403.734084] [Hardware Error]: CPU:1 (19:21:0) MC9_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000105f6a163 [74403.734115] [Hardware Error]: IPID: 0x0000000000000000 [74403.734132] [Hardware Error]: L3 Cache Ext. Error Code: 54 [74403.734133] [Hardware Error]: cache level: L3/GEN, tx: INSN

Has anyone seen an error like this before?

System is still up and running, just disturbing to see errors like this?
 
Last edited:
Do you have overclocking and/or XMP enabled on these machines?
 
Do you have overclocking and/or XMP enabled on these machines?

Running stock, no overclocking, or XMP either. I find it odd that two different Ryzen 7 5800X CPUs would return the same error.

I wonder if it is a Proxmox issue more so than the CPU?

I've ordered a third CPU to try ...
 
2 different CPUs on 2 different boards or did you swapped the CPUs on the same board? If it’s not a critical production system you could try kernel 5.15:

apt update && apt install pve-kernel-5.15

but i doubt this is a pure linux fault.
 
  • Like
Reactions: coolspot
2 different CPUs on 2 different boards or did you swapped the CPUs on the same board? If it’s not a critical production system you could try kernel 5.15:

apt update && apt install pve-kernel-5.15

but i doubt this is a pure linux fault.

I only swapped the CPUs, not the motherboard.

I loaded the kernel as well, I'll monitor.
 
If you’re using the same board chances are high that’s it not the cpu but the board itself.

Other things to try (especially for newer AMD systems):

- disabling C-States / ACPI
- ensure that the PSU is working properly
- cooling/temp monitoring
 
@coolspot did you ever get this sorted out? I just now spotted almost the exact same error:

Code:
Jul 07 16:27:17 hostname kernel: mce: [Hardware Error]: Machine check events logged
Jul 07 16:27:17 hostname kernel: [Hardware Error]: Corrected error, no action required.
Jul 07 16:27:17 hostname kernel: [Hardware Error]: CPU:1 (19:21:0) MC10_STATUS[-|CE|-|-|-|-|-|Poison|Scrub]: 0x80000d00cf53d8c1
Jul 07 16:27:17 hostname kernel: [Hardware Error]: IPID: 0x0000000000000000
Jul 07 16:27:17 hostname kernel: [Hardware Error]: L3 Cache Ext. Error Code: 19
Jul 07 16:27:17 hostname kernel: [Hardware Error]: cache level: L1, tx: INSN
 
Last edited:
@coolspot did you ever get this sorted out? I just now spotted almost the exact same error:

Code:
Jul 07 16:27:17 hostname kernel: mce: [Hardware Error]: Machine check events logged
Jul 07 16:27:17 hostname kernel: [Hardware Error]: Corrected error, no action required.
Jul 07 16:27:17 hostname kernel: [Hardware Error]: CPU:1 (19:21:0) MC10_STATUS[-|CE|-|-|-|-|-|Poison|Scrub]: 0x80000d00cf53d8c1
Jul 07 16:27:17 hostname kernel: [Hardware Error]: IPID: 0x0000000000000000
Jul 07 16:27:17 hostname kernel: [Hardware Error]: L3 Cache Ext. Error Code: 19
Jul 07 16:27:17 hostname kernel: [Hardware Error]: cache level: L1, tx: INSN

I updated the kernel and swapped the CPU and have not seen the error recently. To be frank, not sure what I did to resolve this error or if it was a "real" one...
 
  • Like
Reactions: pgis
I updated the kernel and swapped the CPU and have not seen the error recently. To be frank, not sure what I did to resolve this error or if it was a "real" one...
Ok, thanks for the update, I'm gonna do some stress testing and see if I can reproduce it...
 
Same here, precursor is always MCE ... for which AMD is reportedly making custom changes.

Though an error is reported these seem to be non-critical.
I wonder how to read into this. It is not easy to find a good resource explaining all values, it seems most are derived from MCE code.
 
Hello, my dudes. I have the same problem (no within proxmox, but other linux distro).

Code:
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC21_STATUS[-|CE|-|AddrV|-|-|-|-|-]: 0x850f002d0006fe81
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: Bank 21 is reserved.
2022-02-09T03:33:44+0500 gazoline kernel: [Hardware Error]: cache level: L1, tx: INSN
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC25_STATUS[-|CE|MiscV|AddrV|-|-|-|Poison|-]: 0x8d48085089480474
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: Platform Security Processor Ext. Error Code: 8, Data Cache Bank 2 ECC or parity error.
2022-03-04T12:42:51+0500 gazoline kernel: [Hardware Error]: cache level: RESV, tx: DATA
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC25_STATUS[Over|CE|-|AddrV|-|-|CECC|-|-|-]: 0xc589c0bc0f48f364
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: Platform Security Processor Ext. Error Code: 8, Data Cache Bank 2 ECC or parity error.
2022-04-10T05:04:52+0500 gazoline kernel: [Hardware Error]: cache level: RESV, tx: DATA
-- Boot 439c2141b5524e64acde8580b596535e --
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC8_STATUS[Over|CE|MiscV|AddrV|-|-|CECC|-|Poison|-]: 0xcccccccccce7eba8
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: L3 Cache Ext. Error Code: 39
2022-04-25T15:17:59+0500 gazoline kernel: [Hardware Error]: cache level: RESV, tx: GEN
-- Boot 8068c77855f347b99c36d270c775df79 --
2022-08-22T04:43:11+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-08-22T04:43:11+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC17_STATUS[-|CE|-|-|-|-|-|-|-]: 0x80000003e4eb5163
2022-08-22T04:43:11+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-08-22T04:43:11+0500 gazoline kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 43
2022-08-22T04:43:11+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN
2022-08-27T09:57:58+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2022-08-27T09:57:58+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC7_STATUS[-|CE|-|-|-|-|-|-|-]: 0x800000095c1dd163
2022-08-27T09:57:58+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2022-08-27T09:57:58+0500 gazoline kernel: [Hardware Error]: L3 Cache Ext. Error Code: 29
2022-08-27T09:57:58+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN
-- Boot ffa9ad872e874a288508affa860d784f --
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: Deferred error, no action required.
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC24_STATUS[Over|-|-|AddrV|-|-|CECC|Deferred|Poison|-]: 0xc5455cef619f8d8c
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: System Management Unit Ext. Error Code: 31
2023-02-15T01:24:49+0500 gazoline kernel: [Hardware Error]: cache level: RESV, tx: RESV
-- Boot 31656dd521d441fb8d223001673688d8 --
2023-03-30T09:06:43+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2023-03-30T09:06:43+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC9_STATUS[-|CE|-|-|-|-|-|-|-]: 0x80000016f0c4d163
2023-03-30T09:06:43+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-03-30T09:06:43+0500 gazoline kernel: [Hardware Error]: L3 Cache Ext. Error Code: 4, L3M Data ECC Error.
2023-03-30T09:06:43+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN
2023-04-06T04:02:30+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2023-04-06T04:02:30+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC20_STATUS[-|CE|-|-|-|-|-|-|-]: 0x80000002aec05163
2023-04-06T04:02:30+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-04-06T04:02:30+0500 gazoline kernel: [Hardware Error]: Coherent Slave Ext. Error Code: 0, Illegal Request.
2023-04-06T04:02:30+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN
2023-04-24T02:23:41+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2023-04-24T02:23:41+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC15_STATUS[-|CE|MiscV|-|PCC|-|-|-|-]: 0x8b4200015420c6c7
2023-04-24T02:23:41+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-04-24T02:23:41+0500 gazoline kernel: [Hardware Error]: Microprocessor 5 Unit Ext. Error Code: 32
2023-04-24T02:23:41+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: DATA
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC25_STATUS[-|CE|-|AddrV|PCC|-|CECC|-|-|Scrub]: 0x87c6c748d231c931
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: Error Addr: 0x0000000000000000
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: Platform Security Processor Ext. Error Code: 49
2023-04-27T03:50:59+0500 gazoline kernel: [Hardware Error]: cache level: L1, tx: INSN
2023-04-30T19:46:38+0500 gazoline kernel: [Hardware Error]: Corrected error, no action required.
2023-04-30T19:46:38+0500 gazoline kernel: [Hardware Error]: CPU:1 (19:21:0) MC27_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000109fa1163
2023-04-30T19:46:38+0500 gazoline kernel: [Hardware Error]: IPID: 0x0000000000000000
2023-04-30T19:46:38+0500 gazoline kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 58
2023-04-30T19:46:38+0500 gazoline kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN

The bug is kinda flaky and repeats not very frequently, mostly at first 4 moths of the year. I haven't any assumptions about causes of this error message and how it can be reproduced.
 
same issue here - been there through pve 6 and upgraded through 7 and 8 so I don't think its kernel related.
tried new CPU and same.

It doesn't cause any problems but does tick my OCD off :)


2023-11-22T14:25:46.627932+00:00 berlin kernel: [754586.753429] mce: [Hardware Error]: Machine check events logged


2023-11-22T14:25:46.627940+00:00 berlin kernel: [754586.753545] [Hardware Error]: Corrected error, no action required.


2023-11-22T14:25:46.627941+00:00 berlin kernel: [754586.753657] [Hardware Error]: CPU:1 (19:21:0) MC25_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000112cf9163


2023-11-22T14:25:46.627942+00:00 berlin kernel: [754586.753812] [Hardware Error]: IPID: 0x0000000000000000


2023-11-22T14:25:46.627942+00:00 berlin kernel: [754586.753921] [Hardware Error]: Bank 25 is reserved.


2023-11-22T14:25:46.627942+00:00 berlin kernel: [754586.754027] [Hardware Error]: cache level: L3/GEN, tx: INSN
 
Hello

We are having the exact same error with one of our hosts :
root@bru01-cloud-node01:/tankhddnew/template/iso#
Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.933517] [Hardware Error]: Deferred error, no action required.

Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.933752] [Hardware Error]: CPU:1 (19:21:0) MC21_STATUS[Over|-|-|AddrV|PCC|-|CECC|Deferred|-|-]: 0xc748d08949000002

Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.933988] [Hardware Error]: Error Addr: 0x0000000000000000

Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.934213] [Hardware Error]: IPID: 0x0000000000000000

Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.934405] [Hardware Error]: Bank 21 is reserved.

Message from syslogd@bru01-cloud-node01 at Apr 25 04:00:24 ...
kernel:[142267.934571] [Hardware Error]: cache level: L2, tx: INSN

Is it somthing we have to worry about ?
We have the same hardware configuration in multiple other hosts and never enountered this.
This server seem to reboot itself every 10 - 12 days. We cannot find any reason for it.
 
Hello,
I have Asrock Rack X570D4U and this error occure only when I enable PBS. CPU is 5800X.
Temperatures according IPMI (logged with telegraf) are below 65, but VRM temperature is missing, which I assume is the root cause.
X570 chip is cooled with 40mm fan.
 
  • Like
Reactions: netswitch

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!