Hello, I have a server (on hetzner) with AMD EPYC 7502P CPU.
Matherboard:
I am currently running 3 lxc containers (Ubuntu 20.04) and I am starting to get errors in dmesg:
Installed all latest versions of packages in apt, amd64-microcode is installed. Maybe I missed something? Or the server has a hardware problem?
Matherboard:
Code:
Product Name: KRPA-U16 Series
Version: Rev 1.xx
Code:
# pveversion
pve-manager/8.2.7/3e0176e6bb2ade3b (running kernel: 6.8.12-4-pve)
I am currently running 3 lxc containers (Ubuntu 20.04) and I am starting to get errors in dmesg:
Code:
[153996.660148] [Hardware Error]: CPU:0 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
[153996.660683] [Hardware Error]: Error Addr: 0x00000007b9493840
[153996.661203] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x0b000b000a801202
[153996.661730] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
[153996.661835] EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x1f3524e offset:0x40 grain:64 syndrome:0xb00)
[153996.662875] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
[154324.297911] mce: [Hardware Error]: Machine check events logged
[154324.298839] [Hardware Error]: Corrected error, no action required.