strange ECC messages on new setup.

jim.bond.9862

Renowned Member
Apr 17, 2015
395
34
68
Hi. I have a new setup of proxmox 4. and getting this strange messages now and again.
what bugs me that it looks lke message coming from MC4 where I don't have MC4
can anyone shed some light on this issue please.

I installed edac utils specifically for this.

root@deb8Prox4:~# edac-util -vs
edac-util: EDAC drivers are loaded. 2 MCs detected:
mc0:F10h
mc1:F10h

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.339921] [Hardware Error]: Corrected error, no action required.


Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.342935] [Hardware Error]: CPU:0 (10:8:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc20c000ea080a13


Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.346012] [Hardware Error]: MC4 Error Address: 0x00000003d8e11a60


Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.349063] [Hardware Error]: MC4 Error (node 0): DRAM ECC error detected on the NB.


Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.352113] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)


PS>>just want to add to the post that I am perfectly aware of the fact that I might have a genuine RAM failure.
I am awaiting to a new set of chips to arrive this week so I should be able to replace all the ram I have now with a new set and see if the error go away. but I would still want to identify the faulty chip as I would like to use the rest of the ram if possible. I got a good deal on a 32GB set now but keeping most of the old set would almost double the RAM on the server.
 
Last edited:
Doesn't the dmidecode linux command provide some info about the RAM banks ?