Frozen node debug? Ryzen 7000 stability?

jmjosebest

Renowned Member
Jan 16, 2009
192
37
93
Hello, sometimes I find that a node is inaccessible and when I connect via IPMI I see that the device is completely frozen.
The only option is to shut down the server completely and turn it back on.
After analyzing the logs, there is nothing, the server simply stopped working.

Is there any way to activate a more advanced/verbose debug log?

I'm especially noticing this scenario with the "new" Ryzen 7000 servers CPU.
I have tested both with kernel 5.15 and 6.2.

Thank you!
 
Last edited:
Hello, we continue fighting with the Ryzen 7000, testing with kernel 5.15 and 6.2. But we cannot keep the nodes stable, in the best case we receive a kernel panic, in the worst case the node freezes without showing anything. Any suggestions? Thank you.

1696864446.jpg


1696864616.jpg
 
Could the problem be due to using 48 GB RAM modules?

Ryzen 7000 support 48GB RAM modules and up to a total of 192GB RAM since the implementation of AGESA 1.0.0.7 BIOS.

In the BIOS the 192 GB are detected correctly, and even the operating system displays them.

However I have noticed a suspicious message: Maximum Capacity: 128 GB

Code:
# dmidecode | grep -A 15 Memory
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                HTT (Multi-threading)
        Version: AMD Ryzen 9 7900 12-Core Processor
        Voltage: 1.3 V
        External Clock: 100 MHz
        Max Speed: 5450 MHz
--
32-bit Memory Error Information
        Type: OK
        Granularity: Unknown
        Operation: Unknown
        Vendor Syndrome: Unknown
        Memory Array Address: Unknown
        Device Address: Unknown
        Resolution: Unknown

Handle 0x0028, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 128 GB   <---------------------------!!!!!!!!!!!!!!!
        Error Information Handle: 0x0027
        Number Of Devices: 4

Handle 0x0029, DMI type 19, 31 bytes
Memory Array Mapped Address
        Starting Address: 0x00000000000
        Ending Address: 0x02FFFFFFFFF
        Range Size: 192 GB   <---------------------------!!!!!!!!!!!!!!!
        Physical Array Handle: 0x0028
        Partition Width: 4
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!