[SOLVED] Unable to start RAS daemon on PVE 8.2

PapaGigas

Member
Mar 18, 2023
40
2
8
Hi,

I'm having trouble starting the RAS daemon on PVE 8.2. Here’s the log output:

Code:
root@pve:~# systemctl status rasdaemon
× rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; preset: enabled)
     Active: failed (Result: signal) since Sat 2024-08-03 13:35:18 WEST; 11s ago
   Duration: 160ms
    Process: 60258 ExecStart=/usr/sbin/rasdaemon -f -r (code=killed, signal=BUS)
    Process: 60259 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
   Main PID: 60258 (code=killed, signal=BUS)
        CPU: 9.092s

Can anyone help, please?
 
Last edited:

That's not the usual way a forum driven (mainly) by volunteers works...

Unfortunately I have no idea what a "rasdaemon" is. Does it come with PVE?
 
Unfortunately I have no idea what a "rasdaemon" is. Does it come with PVE?

No, it doesn't come with Proxmox. Here’s what the RAS daemon is:

The rasdaemon program is a daemon which monitors the platform Reliablity, Availability and Serviceability (RAS) reports from the Linux kernel trace events. These trace events are logged in /sys/kernel/debug/tracing, reporting them via syslog/journald.

I mainly use it to monitor ECC memory for any errors, which is why I’ve installed it on Proxmox! ;)
 
a daemon which monitors the platform Reliablity
Sometimes I am curious and so I installed it on a test-machine, on hardware (Ryzen Threadripper). Without doing anything I got
Code:
~# systemctl  status rasdaemon.service  
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-08-04 19:09:15 CEST; 6min ago
I have had no expectation. Do I have to configure something? I ask because:
Code:
~# ras-mc-ctl --status
ras-mc-ctl: drivers not loaded.

It required the hardware to support "something", right? Looks like "just" ECC - which this machine does not have (Homelab). A basic module is loaded:
Code:
~# lsmod | grep -i edac
edac_mce_amd           28672  0

Sorry, no useful help from me for you, and it's a tool not useful for me --> will "purge" it...
 
Sorry, no useful help from me for you

No problem. I've managed to get it running:

Code:
root@pve:~# systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; preset: enabled)
     Active: active (running) since Mon 2024-08-05 14:13:02 WEST; 2min 51s ago
    Process: 821798 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
   Main PID: 821797 (rasdaemon)
      Tasks: 256 (limit: 154308)
     Memory: 37.0M
        CPU: 3.149s
     CGroup: /system.slice/rasdaemon.service
             └─821797 /usr/sbin/rasdaemon -f -r

But it failed again... this is what it shows in syslog:

Aug 05 20:01:17 pve kernel: traps: rasdaemon[88259] trap stack segment ip:784b8a8137f4 sp:784afdffea20 error:0 in libsqlite3.so.0.8.6[784b8a747000+f4000]

I'm now running memtest86+ to see if it's due to some faulty RAM.

Thanks anyway! ;)
 
Last edited:
After running Memtest86+ with no errors detected, I noticed that x2APIC was enabled in the BIOS. I changed it to xAPIC, and now it's working fine. I'll mark this thread as solved in case anyone else encounters this issue. ;)
 
  • Like
Reactions: UdoB
No problem. I've managed to get it running:
You said you managed to get RASdaemon running - what did you do to get it running? I know you marked the thread as resolved - and it appears that you fixed your memory errors but I don't see information about how you managed to get RASdaemon running. Thanks!