Patch x570/Ryzen EDAC support into pve 6.1?

I used wires to short pins in the memory bank. I have an example photo as attachment, but obivously the wires were in the bank with the memory in it while testing. On the photo they are in an empty bank just to make it more easy to see.

Does this change anything for you regarding the possibility to break the cpu? Please elaborate as much as you can.

I am not sure the people in the thread you are referring to are interested in my findings. Perhaps I overlooked it while quick scanning but I did not see any interested regarding ECC error reporting. So I gladly give you permission to link my findings anywhere. My request of where that is is just a request because I am curious about responses. But it is not mandatory ;)

20200419_205704 downsized.jpg
 
Upgrading to Proxmox 6.2 (5.4.34-1-pve) has fixed this.

Code:
# dmesg | grep -i edac
[    0.830488] EDAC MC: Ver: 3.0.0
[   17.087384] EDAC amd64: Node 0: DRAM ECC enabled.
[   17.087549] EDAC amd64: F17h_M70h detected (node 0).
[   17.087617] EDAC MC: UMC0 chip selects:
[   17.087618] EDAC amd64: MC: 0:  8192MB 1:  8192MB
[   17.087634] EDAC amd64: MC: 2:  8192MB 3:  8192MB
[   17.087653] EDAC MC: UMC1 chip selects:
[   17.087654] EDAC amd64: MC: 0:  8192MB 1:  8192MB
[   17.087666] EDAC amd64: MC: 2:  8192MB 3:  8192MB
[   17.087677] EDAC amd64: using x16 syndromes.
[   17.087687] EDAC amd64: MCT channel count: 2
[   17.087752] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
[   17.087780] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[   17.087803] AMD64 EDAC driver v3.5.0

# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.
 
@EricD Welcome back.

Thanks for sharing. Earlier in this thread I also had success with the l8st kernel. It's great that it is now part of pve 6.2
 
Hello, is it possible to add 19h models from upstream too?

Because of Ryzen 5xxx generation Edac Support (Request)
Only 17h was added which is the 3xxx generation (works)
 
Just checked the code, the kernel has already the 19h patches... :rolleyes:

but:
Bash:
# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
Code:
[    0.191382] EDAC MC: Ver: 3.0.0
[    5.075875] EDAC amd64: Node 0: DRAM ECC enabled.
[    5.075875] EDAC amd64: F19h detected (node 0).
[    5.075883] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    5.075896] EDAC amd64: Error: Error probing instance: 0
[    5.880715] EDAC amd64: Node 0: DRAM ECC enabled.
[    5.880716] EDAC amd64: F19h detected (node 0).
[    5.880723] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    5.880742] EDAC amd64: Error: Error probing instance: 0
[    6.436641] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.436642] EDAC amd64: F19h detected (node 0).
[    6.436649] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.436664] EDAC amd64: Error: Error probing instance: 0
[    6.500262] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.500263] EDAC amd64: F19h detected (node 0).
[    6.500267] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.500282] EDAC amd64: Error: Error probing instance: 0
[    6.560396] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.560397] EDAC amd64: F19h detected (node 0).
[    6.560400] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.560415] EDAC amd64: Error: Error probing instance: 0
[    6.640573] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.640573] EDAC amd64: F19h detected (node 0).
[    6.640576] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.640593] EDAC amd64: Error: Error probing instance: 0
[    6.709010] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.709011] EDAC amd64: F19h detected (node 0).
[    6.709019] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.709036] EDAC amd64: Error: Error probing instance: 0
[    6.760740] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.760742] EDAC amd64: F19h detected (node 0).
[    6.760753] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.760774] EDAC amd64: Error: Error probing instance: 0
[    6.829072] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.829073] EDAC amd64: F19h detected (node 0).
[    6.829080] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.829096] EDAC amd64: Error: Error probing instance: 0
[    6.896573] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.896574] EDAC amd64: F19h detected (node 0).
[    6.896582] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.896599] EDAC amd64: Error: Error probing instance: 0
[    6.949019] EDAC amd64: Node 0: DRAM ECC enabled.
[    6.949020] EDAC amd64: F19h detected (node 0).
[    6.949027] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    6.949043] EDAC amd64: Error: Error probing instance: 0
[    7.020707] EDAC amd64: Node 0: DRAM ECC enabled.
[    7.020708] EDAC amd64: F19h detected (node 0).
[    7.020716] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    7.020736] EDAC amd64: Error: Error probing instance: 0
[    7.108646] EDAC amd64: Node 0: DRAM ECC enabled.
[    7.108647] EDAC amd64: F19h detected (node 0).
[    7.108654] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    7.108671] EDAC amd64: Error: Error probing instance: 0
[    7.192794] EDAC amd64: Node 0: DRAM ECC enabled.
[    7.192795] EDAC amd64: F19h detected (node 0).
[    7.192804] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    7.193055] EDAC amd64: Error: Error probing instance: 0
[    7.264740] EDAC amd64: Node 0: DRAM ECC enabled.
[    7.264742] EDAC amd64: F19h detected (node 0).
[    7.264751] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    7.265009] EDAC amd64: Error: Error probing instance: 0
[    7.304777] EDAC amd64: Node 0: DRAM ECC enabled.
[    7.304778] EDAC amd64: F19h detected (node 0).
[    7.304788] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
[    7.305135] EDAC amd64: Error: Error probing instance: 0

MB: Asrock Rack X570D4I-2T
CPU: Ryzen 7 5800X
RAM: 2x MTA18ASF4G72HZ-3G2B1 (2x 32GB Micron Based ECC Ram @2933)

Ordered new Samsung 2666mhz ram from QVL List, maybe it helps :rolleyes:

Any other ideas?
 
Well, it seems this issue is fixed since Kernel 5.10 RC7.

For anyone that follows this issue, enjoy waiting for proxmox with the 5.10 kernel ;)

Possible Solutions:

Option 1: blacklist amd64_edac kernel module, the messages will go away, error correction will still get done by your hardware anyway.

Option 2: leave as is and wait for 5.10 kernel, so far i get only the messages and random messages while working via ssh, cpu error etc... but no crashes or any problems.

Option 3: Backport all the edac patches, you can find them here: https://patchwork.kernel.org/project/linux-edac/list/
- You can use pwclient to download them all in one run, and simply put them into pve-kernel/patches/ directory then compile your pve kernel...
- you need to check yourself most patches, since many are already in the pve-kernel included, thats a ton of work...
- some patches need manual checking, because they might be incompatible. from my findings all patches after january/february arent in the kernel.
- This is only for hardcore inthusiasts, from my findings, even compiling vanilla pve-kernel without any modifications, breaks my system... a vanilla pve-kernel boots for exaple here, but without working network... And the Kernel from repo, works perfectly...

Proxmox is still the best, we just shouldn't expect immediately newest hardware support :-(
Thread can be closed :)
 
HI....ECC uphold works, answering to the OS works - both single and double piece blunders. In the event that you have issues under windows, at that point that is windows deficiency. ECC revealing is only a lot of registers/hinders - much lower in the HW stack then the OS itself. The OS simply should have the option to peruse/decipher what the CPU is stating.

What doesn't work is the announcing by the IPMI on those sheets.

My hypothesis is that asrock is under ban/order from AMD that keeps them from announcing full ECC uphold on non-epyc computer processors. Or on the other hand essentially amd doesn't give asrock all they require for full help approval.

pcb assembly in usa
 
Last edited:
After a lot of effort by mastakilla, someone in close contact with asrock rack, it turns out to be AMD that does not cooperate to get ECC events to be logged by IPMI on non server boards. My stance is that asrock rack should not have marketed my x470d40 board as such in that case
 
I am having the same issue, probably because of kernel 5.4:

  • Mainboard: Asus Pro WS 565-ACE
  • BIOS: tested both Version 0502 (2021-01-15) and current Version 0703 (2021-03-11) which is the newest BIOS version as of today. Same error messages in both versions.
  • CPU: AMD Ryzen 9 5950X
  • (Hetzner AX101)

Code:
/var/log/syslog:
Apr  3 12:42:13 MyHost kernel: [    6.413910] kvm: Nested Virtualization enabled
Apr  3 12:42:13 MyHost kernel: [    6.413919] kvm: Nested Paging enabled
Apr  3 12:42:13 MyHost kernel: [    6.413919] SVM: Virtual VMLOAD VMSAVE supported
Apr  3 12:42:13 MyHost kernel: [    6.413920] SVM: Virtual GIF supported
Apr  3 12:42:13 MyHost kernel: [    6.418483] MCE: In-kernel MCE decoding enabled.
Apr  3 12:42:13 MyHost kernel: [    6.428492] Console: switching to colour frame buffer device 128x48
Apr  3 12:42:13 MyHost kernel: [    6.437309] ast 0000:06:00.0: fb0: astdrmfb frame buffer device
Apr  3 12:42:13 MyHost kernel: [    6.468931] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 12:42:13 MyHost kernel: [    6.468931] EDAC amd64: F19h detected (node 0).
Apr  3 12:42:13 MyHost kernel: [    6.468939] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 12:42:13 MyHost kernel: [    6.468955] EDAC amd64: Error: Error probing instance: 0
Apr  3 12:42:13 MyHost kernel: [    6.488237] [drm] Initialized ast 0.1.0 20120228 for 0000:06:00.0 on minor 0
Apr  3 12:42:13 MyHost kernel: [    6.544556] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 12:42:13 MyHost kernel: [    6.544557] EDAC amd64: F19h detected (node 0).
Apr  3 12:42:13 MyHost kernel: [    6.544563] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 12:42:13 MyHost dbus-daemon[1546]: [system] AppArmor D-Bus mediation is enabled
Apr  3 12:42:13 MyHost kernel: [    6.544579] EDAC amd64: Error: Error probing instance: 0
Apr  3 12:42:13 MyHost kernel: [    6.612644] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 12:42:13 MyHost kernel: [    6.612645] EDAC amd64: F19h detected (node 0).
Apr  3 12:42:13 MyHost kernel: [    6.612647] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 12:42:13 MyHost kernel: [    6.612663] EDAC amd64: Error: Error probing instance: 0
Apr  3 12:42:13 MyHost kernel: [    6.673040] EDAC amd64: Node 0: DRAM ECC enabled.

Longer syslog excerpt attached

Bash:
~# uname -a
Linux MyHost 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) x86_64 GNU/Linux
~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1


Still stuck with kernel 5.4 .. so let's wait for 5.10 :)

Well, it seems this issue is fixed since Kernel 5.10 RC7.

For anyone that follows this issue, enjoy waiting for proxmox with the 5.10 kernel ;)
 

Attachments

  • syslog.txt
    136.1 KB · Views: 1
This is solved, proxmox released an 5.11 kernel.
You have just to install it xD
 
i readed somewhere, that it's in the stable or enterprise repo already too.
 
it's not default. you have to install it yourself.

make an apt search pve-kernel-5.11

Thanks, I did it, that was an easy apt-get install pve-kernel-5.11 (see https://forum.proxmox.com/threads/kernel-5-11.86225/page-2#post-381945)

relevant /var/log/syslog parts:

Code:
# boot with kernel 5.4:
Apr  3 16:17:44 MyHost kernel: [    0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
...
Apr  3 16:17:44 MyHost kernel: [    5.617493] MCE: In-kernel MCE decoding enabled.
Apr  3 16:17:44 MyHost kernel: [    5.618294] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 16:17:44 MyHost kernel: [    5.618295] EDAC amd64: F19h detected (node 0).
Apr  3 16:17:44 MyHost kernel: [    5.618304] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 16:17:44 MyHost kernel: [    5.618322] EDAC amd64: Error: Error probing instance: 0
Apr  3 16:17:44 MyHost kernel: [    5.633007] [drm] Initialized ast 0.1.0 20120228 for 0000:06:00.0 on minor 0
Apr  3 16:17:44 MyHost kernel: [    5.685516] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 16:17:44 MyHost kernel: [    5.685517] EDAC amd64: F19h detected (node 0).
Apr  3 16:17:44 MyHost kernel: [    5.685523] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 16:17:44 MyHost kernel: [    5.685545] EDAC amd64: Error: Error probing instance: 0
Apr  3 16:17:44 MyHost kernel: [    5.757242] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  3 16:17:44 MyHost kernel: [    5.757243] EDAC amd64: F19h detected (node 0).
Apr  3 16:17:44 MyHost kernel: [    5.757248] EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Apr  3 16:17:44 MyHost kernel: [    5.757271] EDAC amd64: Error: Error probing instance: 0
Apr  3 16:17:44 MyHost kernel: [    5.825806] EDAC amd64: Node 0: DRAM ECC enabled.


# boot with kernel 5.11:
Apr  5 20:13:38 MyHost kernel: [    0.000000] Linux version 5.11.7-1-pve (build@pve) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP PVE 5.11.7-1~bpo10 (Thu, 18 Mar 2021 16:17:24 +0100) ()
...
Apr  5 20:13:38 MyHost kernel: [    5.927527] MCE: In-kernel MCE decoding enabled.
Apr  5 20:13:38 MyHost kernel: [    5.928764] EDAC amd64: F19h_M20h detected (node 0).
Apr  5 20:13:38 MyHost kernel: [    5.928850] EDAC amd64: Node 0: DRAM ECC enabled.
Apr  5 20:13:38 MyHost kernel: [    5.928851] EDAC amd64: MCT channel count: 2
Apr  5 20:13:38 MyHost kernel: [    5.928892] EDAC MC0: Giving out device to module amd64_edac controller F19h_M20h: DEV 0000:00:18.3 (INTERRUPT)
Apr  5 20:13:38 MyHost kernel: [    5.928896] EDAC MC: UMC0 chip selects:
Apr  5 20:13:38 MyHost kernel: [    5.928896] EDAC amd64: MC: 0: 16384MB 1: 16384MB
Apr  5 20:13:38 MyHost kernel: [    5.928897] EDAC amd64: MC: 2: 16384MB 3: 16384MB
Apr  5 20:13:38 MyHost kernel: [    5.928905] EDAC MC: UMC1 chip selects:
Apr  5 20:13:38 MyHost kernel: [    5.928906] EDAC amd64: MC: 0: 16384MB 1: 16384MB
Apr  5 20:13:38 MyHost kernel: [    5.928906] EDAC amd64: MC: 2: 16384MB 3: 16384MB
Apr  5 20:13:38 MyHost kernel: [    5.928907] EDAC amd64: using x16 syndromes.
Apr  5 20:13:38 MyHost kernel: [    5.928913] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
Apr  5 20:13:38 MyHost kernel: [    5.928914] AMD64 EDAC driver v3.5.0
Apr  5 20:13:38 MyHost kernel: [    5.930692] intel_rapl_common: Found RAPL domain package
Apr  5 20:13:38 MyHost kernel: [    5.930693] intel_rapl_common: Found RAPL domain core

Bash:
~# uname -a
Linux MyHost 5.11.7-1-pve #1 SMP PVE 5.11.7-1~bpo10 (Thu, 18 Mar 2021 16:17:24 +0100) x86_64 GNU/Linux
~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.11.7-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.11: 7.0-0+3~bpo10
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.11.7-1-pve: 5.11.7-1~bpo10
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
~#
 
  • Like
Reactions: Ramalama

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!