syslog getting spammed with "notice: RRD update error"s

Stunty

New Member
Apr 13, 2025
18
0
1
Performed an upgrade from 8 to 9 on a dozen servers. All seemed to run without an error.

On several machines, however, syslog is getting spammed with the following error:

2025-08-16T14:35:11.163904-04:00 ceph-six pmxcfs[2181]: [status] notice: RRD update error /var/lib/rrdcached/db/pve-vm-9.0/130: /var/lib/rrdcached/db/pve-vm-9.0/130: expected 17 data source readings (got 10) from 1655379309

My Google-Fu sucks and I can't find anything relevant. Does anyone have an idea? The systems/VMs/CTs appear to be running normally otherwise.
 
Same here, but with the message being slightly different. Can you check if the error persists if you backup /var/lib/rrdcached/db/pve-vm-9.0/130 out of the way and restart rrdcached ?
Code:
mv /var/lib/rrdcached/db/pve-vm-9.0/130{,.$(date +%Y%m%d)}
In your case, it usually means your VM was slightly different (disc removed for example) between the times metrics were collected (17 sources originally instead of 10 now)
 
i tried stopping rrdcached, removing the offending files, and restarting. same error persists.
 
Hey, are all nodes running on Proxmox VE 9 by now?
If so, do you see files for all guests (VMs and CTs) on all hosts in the /var/lib/rrdcached/db/pve-vm-9.0 directory?
 
All nodes are running on 9 at this point
I see all vm/cts listed in the pve-vm-9.0 directory.
 
okay. that is curious. are all guests powered on or are some powered off?
For example, guest VMID 130 in that error message from the first post. Was it on or off at that time?
 
Hmm, can you post the output of pveversion -v on the host where you have/had guest 130? Ideally inside of [code][/code] tags (or use the formatting options in the buttons above (</> for a code block)

Is guest 130 a VM or a CT?
 
Last edited:
VM

Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.5 (running version: 9.0.5/9c5600b249dbfd2f)
proxmox-kernel-helper: 9.0.3
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.14-1
proxmox-backup-file-restore: 4.0.14-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.18
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.3-pve1
 
Hmm, those versions look new enough. Can you please restart the pvestatd service on the hosts? Either in the Node→System panel, or with
Code:
systemctl restart pvestatd

Does that help to get rid the log messages?
 
nope.

Aug 25 06:49:20 ceph-six pmxcfs[4081528]: [status] notice: RRD update error /var/lib/rrdcached/db/pve-vm-9.0/130: /var/lib/rrdcached/db/pve-vm-9.0/130: expected 17 data source readings (got 10) from 1756118960
 
This is curious.

Would it be okay for you to gather a bit more information? Because it seems that for some reason, the pvestatd service still collects and distributes the old pre PVE 9 metric format, but under the new key...
So to further see what might be going on, could you please do the following?

On the host where VM 130 is, edit the /usr/share/perl5/PVE/Cluster.pm file. Look for the following function; it should start at line 451:
Code:
sub broadcast_rrd {
    my ($rrdid, $data) = @_;

    eval { &$ipcc_update_status("rrd/$rrdid", $data); };
    my $err = $@;

    warn $err if $err;
}

And then add the following 3 lines so it looks like this in the end:
Code:
sub broadcast_rrd {
    my ($rrdid, $data) = @_;

    open(FH, ">>", "/tmp/broadcast.log");
    print FH "{$rrdid}:  ${data}\n";
    close FH;

    eval { &$ipcc_update_status("rrd/$rrdid", $data); };
    my $err = $@;

    warn $err if $err;
}
Save it and restart the pvestatd service: systemctl restart pvestatd

Every 10 seconds you should see that all the metrics that are broadcasted are also written to that file. You can follow it in an SSH session with tail -f /tmp/broadcast.log

I am definitely interested in the output for pve-vm-9.0/130 line.

Once we have the output, you can undo the changes in the Cluster.pm file and restart the pvestatd once more.
 
Code:
grep 130 /tmp/broadcast:

{pve-vm-9.0/130}:  1651798:vm-gravity-2:running:0:1756217568:16:0:137436856320:12762603520:34359738368:0:88687789818:1045875536201:73589994378408:268674600960:16533270528:0:0:0.81:0.81:0:0
{pve-vm-9.0/130}:  1651808:vm-gravity-2:running:0:1756217578:16:0.00197152985067798:137436856320:12763025408:34359738368:0:88687890079:1045877114399:73589994378408:268674600960:16533224448:0:0:0.36:0.36:0:0
{pve-vm-9.0/130}:  1651818:vm-gravity-2:running:0:1756217588:16:0.00249793790566357:137436856320:12765351936:34359738368:0:88687987273:1045878166711:73589994378408:268675268608:16533224448:0:0:0.23:0.23:0:0
{pve-vm-9.0/130}:  1651828:vm-gravity-2:running:0:1756217598:16:0.00295016745695559:137436856320:12765331456:34359738368:0:88688121508:1045878726023:73589994378408:268675313664:16533224448:0:0:0.1:0.1:0:0
{pve-vm-9.0/130}:  1651837:vm-gravity-2:running:0:1756217608:16:0.00243692628236993:137436856320:12765265920:34359738368:0:88688224734:1045880830733:73589994378408:268677361664:16533223424:0:0:0.11:0.11:0:0
{pve-vm-9.0/130}:  1651848:vm-gravity-2:running:0:1756217618:16:0.00309997574414379:137436856320:12752109568:34359738368:0:88688346238:1045882409089:73589994378408:268677976064:16533223424:0:0:0.19:0.19:0:0
{pve-vm-9.0/130}:  1651858:vm-gravity-2:running:0:1756217629:16:0.00172591614920651:137436856320:12752658432:34359738368:0:88688450884:1045883987519:73589994378408:268677976064:16533223424:0:0:0.05:0.05:0:0
{pve-vm-9.0/130}:  1651868:vm-gravity-2:running:0:1756217638:16:0.00171097550931235:137436856320:12753035264:34359738368:0:88688538839:1045885039677:73589994378408:268677976064:16533223424:0:0:0.02:0.02:0:0
 
Thanks. That looks good and is as it should be. So I will have to take a look at the code that is receiving and processing that data.
 
To get more debug output from the processing side, can you please install the following build of pve-cluster?

http://download.proxmox.com/temp/pve-cluster-9-rrd-debug/

Code:
wget http://download.proxmox.com/temp/pve-cluster-9-rrd-debug/pve-cluster_9.0.6%2Bdebug-rrd-1_amd64.deb
wget http://download.proxmox.com/temp/pve-cluster-9-rrd-debug/SHA256SUMS

# check checksums:
sha256sum -c SHA256SUMS

# install
apt install ./pve-cluster_9.0.6+debug-rrd-1_amd64.deb

Then follow the journal of pve-cluster:
Code:
journalctl -f -u pve-cluster.service
It will print quite a bit of debug output in the following form:

Code:
Aug 27 09:47:47 cephtest2 pmxcfs[1480487]: [status] notice: ----
Aug 27 09:47:47 cephtest2 pmxcfs[1480487]: [status] notice: key: pve-vm-9.0/100
Aug 27 09:47:47 cephtest2 pmxcfs[1480487]: [status] notice: data: 1756280867:2:0.00197022711257839:536870912:257724416:5368709120:0:20678:9556:187827040:40276480:436752384:0:0:0:0:0:0
Aug 27 09:47:47 cephtest2 pmxcfs[1480487]: [status] notice: keep_columns: 0
Aug 27 09:47:47 cephtest2 pmxcfs[1480487]: [status] notice: padding: 0

For all resources. Please get the block for VM 130. And if you have other VMs that seem to be fine and not causing the expected 17 data source readings (got 10) from errors, one of those too for comparison.

Once you have gathered the debug infos, you can reinstall the non-debug build with:
Code:
apt install --reinstall pve-cluster=9.0.6
After which, the additional debug output in the journal should stop.
 
From what I can tell, the error is being generated for every VM & CT on the cluster. As an example 127 is a CT, 130 is a VM. Here's a sample I captured:

Code:
Aug 27 06:54:39 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:39 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/127
Aug 27 06:54:39 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292079:1:0.000629676858365403:1073741824:102916096:8350298112:2028048384:2187731264:103699530:9081753600:682511974
Aug 27 06:54:39 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:39 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/130
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292083:16:0.00172689044410428:137436856320:12950806528:34359738368:0:91618658831:1075323458893:73590006408360:275549741056
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/110
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292084:4:0.0129740412876344:17179869184:7707553792:511101108224:0:16789452349:11995414854:4071916058:4845463654
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/100
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292084:4:0.0105714410491836:17179869184:9192456192:137438953472:0:19758066909:3347868268:3144937540:7639623168
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:44 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/124
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292084:24:0.00138396161281809:34358689792:20853125120:2147483648:0:828385025055:43445504498:1842039210454:13776452096
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/131
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292085:20:U:68718428160:U:137438822400:0:U:U:U:U
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/128
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292085:4:U:8594128896:U:68719476736:0:U:U:U:U
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/139
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292085:32:U:17179869184:U:2147483648:0:U:U:U:U
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11

Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: ----
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: key: pve-vm-9.0/107
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: data: 1756292085:4:U:8594128896:U:68719476736:0:U:U:U:U
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: keep_columns: 11
Aug 27 06:54:45 vm-gravity-2 pmxcfs[1143425]: [status] notice: padding: 11
 
Hmm. It seems that the detection of which files or directories are present in the /var/lib/rrdcached/db directory is coming to wrong conclusions.

Would you mind posting the output of the following command?

Code:
for i in pve2-vm pve-vm-9.0; do echo "#### ${i}:" && ls -l /var/lib/rrdcached/db/$i; done
 
Code:
#### pve2-vm:
ls: cannot access '/var/lib/rrdcached/db/pve2-vm': No such file or directory
#### pve-vm-9.0:
total 39480
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 100
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 101
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 102
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 103
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 104
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 105
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 106
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 107
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 108
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 109
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 110
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 111
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 112
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 113
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 114
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 115
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 116
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 117
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 118
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 119
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 123
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 124
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 126
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 127
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 128
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 129
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 130
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 131
-rw-r--r-- 1 root root 1346072 Aug 27 13:49 139
-rw-r--r-- 1 root root 1346072 Aug 17 09:06 150
 
edit the /usr/share/perl5/PVE/Cluster.pm file. Look for the following function; it should start at line 451:
i wonder, can it be possible to add a stop or timeout around that line : ( warn $err if $err; ) In order to stop the error spamming ? Like a 2-3min of multiple page of error can be sufficient.
 
Hey,

thanks for the output. Unfortunately I currently can't explain why it ended up doing what it did.
Could you please run the following debug build and get the journal? http://download.proxmox.com/temp/pve-cluster-9-rrd-debug-v2/

This one debug prints in a lot more places, which will hopefully allow me to figure out which code paths are taken.