EDAC-Utils shows incomplete information in Proxmox 6.1

kavejo

Member
Jan 28, 2020
16
2
23
36
Good morning all,

I have recently installed Proxom on top of a valilla Debian Buster installation, following the article https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster.

I have noticed that whilst before installing Proxmox, the edac-utils -v command was showing information about all the memory modules, since I installed Proxmox, I can no loner see information for each memory module; basically it lacks all the csrow* entries.

Basically the output looks like this:
Code:
# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info

Instead of looking like this:
Code:
# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: ch0: 0 Corrected Errors
mc0: csrow0: ch1: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: ch0: 0 Corrected Errors
mc0: csrow1: ch1: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: ch0: 0 Corrected Errors
mc0: csrow2: ch1: 0 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: ch0: 0 Corrected Errors
mc0: csrow3: ch1: 0 Corrected Errors
mc0: csrow4: 0 Uncorrected Errors
mc0: csrow4: ch0: 0 Corrected Errors
mc0: csrow4: ch1: 0 Corrected Errors
mc0: csrow5: 0 Uncorrected Errors
mc0: csrow5: ch0: 0 Corrected Errors
mc0: csrow5: ch1: 0 Corrected Errors
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info
mc1: csrow0: 0 Uncorrected Errors
mc1: csrow0: ch0: 0 Corrected Errors
mc1: csrow0: ch1: 0 Corrected Errors
mc1: csrow1: 0 Uncorrected Errors
mc1: csrow1: ch0: 0 Corrected Errors
mc1: csrow1: ch1: 0 Corrected Errors
mc1: csrow2: 0 Uncorrected Errors
mc1: csrow2: ch0: 0 Corrected Errors
mc1: csrow2: ch1: 0 Corrected Errors
mc1: csrow3: 0 Uncorrected Errors
mc1: csrow3: ch0: 0 Corrected Errors
mc1: csrow3: ch1: 0 Corrected Errors
mc1: csrow4: 0 Uncorrected Errors
mc1: csrow4: ch0: 0 Corrected Errors
mc1: csrow4: ch1: 0 Corrected Errors
mc1: csrow5: 0 Uncorrected Errors
mc1: csrow5: ch0: 0 Corrected Errors
mc1: csrow5: ch1: 0 Corrected Errors

Upon checking /sys/devices/system/edac/mc/mc*/ I can see that there are no csrow* entries anymore.
I am running Proxmox VE (fully updated) on a HPE DL380 Gen. 8, with 2 * Xeon E5-2450L (Sandy Bridge) and 12 * 16GB Sasung M393B2G70BH0-YH9 PC3l-10600R-09.
I am using the PVE-NO-SUBSCRIPTION repository.

I have seen few threads stating that with the latest kernel this issue is affecting Ryzen CPUs such as https://forum.proxmox.com/threads/patch-x570-ryzen-edac-support-into-pve-6-1.63744/, https://forum.proxmox.com/threads/no-more-ecc-since-kernel-5-3.64047/#post-290931 and https://forum.proxmox.com/threads/linux-kernel-5-3-for-proxmox-ve.59398/page-3#post-284619.

Is this something that is affecting other CPUs or is this an issue specific to Ryzen?
Are other people on Sandy Bridge facing this same problem? If so, how could it be overcome?

Thank you!
 
Last edited:
Just adding an additional information. This seems to be specific to the 5.3 kernel.
I have tried to boot Ubuntu LTS with 4.* kernel and the edac-utils show all the information, with the latest Ubuntu release (kernel 5.3) these are missing.
 
Hi @cromatn5,

I did have a look at your post however I am not seeing the rationale behind your cron script.
Perhaps I am missing some details - and that is why I have not tried it yet.

Would yo be so kind to share the thought process behind what the script does, so I can validate if that same approach applies to my issue, please?
I have noticed that this issue is not Proxmox specific but rather kernel specific (Ubuntu and Debian with 5.3 are both affected).

Thanks.
 
Hi @cromatn5,

I can't see the skl_uncore driver in my case and unfortunately I don't have a pre 5.3 output of lspci -v.
Is there any way - I am new to Proxmox - to install the 5.0 or preceding kernel to test for comparison?

Thanks!
 
Hi @cromatn5,

I have just tested with both kernel 5.0 and 5.3 and I can see that edac-utils misses the per-module information.
Seems like this did not happen with 4.* kernel which I could only test with Debian 10 and Ubuntu Live.

Comparing lspci between kernel 5.0 and 5.3 this led to no differences.

I was curious to test an older kernel (4.*) in Proxmox but doing an apt search I could not see any 4.* kernel available.

Would you have any suggestion.

Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!