RDDC Update Errors

crazywolf13

Member
Oct 15, 2023
42
6
8
Hi
Since adding some more PVE Nodes I seem to have issues with rrdc errors spamming syslog and getting signed out of the webui after very short amounts, sometimes just about 10 minutes.

I tried the fix outline here (both also the one for newer proxmox versions):

With this output:
Bash:
root@lenovo7:~# cd /var/lib/rrdcached/
systemctl stop rrdcached
mv rrdcached rrdcached.bck
systemctl start rrdcached
systemctl restart pve-cluster
mv: cannot stat 'rrdcached': No such file or directory
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/
db  journal
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db
pve2-node  pve2-storage  pve2-vm
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db/pve2-
pve2-node/    pve2-storage/ pve2-vm/    
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db/pve2-
pve2-node/    pve2-storage/ pve2-vm/    
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db/pve2-node/
lenovo1  lenovo2  lenovo3  lenovo4  lenovo5  lenovo6  lenovo7  tower5  tower8
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db/pve2-storage/
lenovo1  lenovo2  lenovo3  lenovo4  lenovo5  lenovo6  lenovo7  tower5  tower8
root@lenovo7:/var/lib/rrdcached# ls /var/lib/rrdcached/db/pve2-vm
100  103  106  109  112  115  118  121  124  127  130  133  136  139  142  145  148  151  154
101  104  107  110  113  116  119  122  125  128  131  134  137  140  143  146  149  152  230
102  105  108  111  114  117  120  123  126  129  132  135  138  141  144  147  150  153
root@lenovo7:/var/lib/rrdcached#

Edit: the code further down in that thread above which is suggested for newer proxmox versions runs through, while it did fix the charts, it still shows the errors in the syslog.


I previously renamed the node tower5 to tower8.

Also weirdly enough on lenovo7 there is a pve-node-2 folder in rrdcache?
Thant seems a bit strange to me.

Also I see in the syslog some entries that are from October 2025, which I guess are the cause of this issue?

1747210706100.png


Here some snippets of my syslog:
1747215316364.png

and also erros like this:
1747215521664.png

Weirdly this issues spanns over all my nodes.
I also ran `timedatectl status` on each node, and every time/date seemed to be correct?!
Thanks for any help!
 
Last edited:
Edit: after further looking through this seems like this is more of a single-node problem? But I have a 8-Node Cluster.

But for me the syslog of all nodes contains these errors.

The stats for node 1,3,6 contain nothing and the date is from 1970 on the webgui for like cpu/ram.
Nodes 2,4,5,7,8 display date and stats just fine.

I could Imagine a node having lost time for a short amount of time, due to CMOS clock, but I cannot imagine 3 nodes having this problem in such a short time frame, and nevertheless timedatectl showed date for all of them correctly.