strange ECC messages on new setup.

jim.bond.9862 · Aug 24, 2015

Hi. I have a new setup of proxmox 4. and getting this strange messages now and again.
what bugs me that it looks lke message coming from MC4 where I don't have MC4
can anyone shed some light on this issue please.

I installed edac utils specifically for this.

root@deb8Prox4:~# edac-util -vs
edac-util: EDAC drivers are loaded. 2 MCs detected:
mc0:F10h
mc1:F10h

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.339921] [Hardware Error]: Corrected error, no action required.

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.342935] [Hardware Error]: CPU:0 (10:8:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc20c000ea080a13

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.346012] [Hardware Error]: MC4 Error Address: 0x00000003d8e11a60

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.349063] [Hardware Error]: MC4 Error (node 0): DRAM ECC error detected on the NB.

Message from syslogd@deb8Prox4 at Aug 24 10:58:41 ...
kernel:[ 7042.352113] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

PS>>just want to add to the post that I am perfectly aware of the fact that I might have a genuine RAM failure.
I am awaiting to a new set of chips to arrive this week so I should be able to replace all the ram I have now with a new set and see if the error go away. but I would still want to identify the faulty chip as I would like to use the rest of the ram if possible. I got a good deal on a 32GB set now but keeping most of the old set would almost double the RAM on the server.

manu · Aug 25, 2015

Doesn't the dmidecode linux command provide some info about the RAM banks ?

jim.bond.9862 · Aug 25, 2015

Will try that. Didn't know about it.

Sent from my phone

Search

Search

strange ECC messages on new setup.

jim.bond.9862

Renowned Member

manu

Proxmox Staff Member

jim.bond.9862

Renowned Member

We value your privacy