[SOLVED] Error on systemctl status pve-cluster

bond347 · Mar 3, 2023

Hi All,

I have 3x nodes cluster.

I saw error message upon issuing this command

root@em1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-03-03 16:55:13 +08; 47min ago
Process: 1268 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1276 (pmxcfs)
Tasks: 6 (limit: 153967)
Memory: 65.0M
CPU: 1.943s
CGroup: /system.slice/pve-cluster.service
└─1276 /usr/bin/pmxcfs

Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/em2/local: -1
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRD update error/var/lib/rrdcached/db/pve2-storage/em2/local: /var/lib/rrdcached/db/pve2-stor>
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/em3: -1
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-node/em3: /var/lib/rrdcached/db/pve2-node/em3: ill>
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/903: -1
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/903: /var/lib/rrdcached/db/pve2-vm/903: illegal>
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/em3/local: -1
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/em3/local: /var/lib/rrdcached/db/pve2-stor>
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/em3/local-lvm: -1
Mar 03 17:43:07 em1 pmxcfs[1276]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/em3/local-lvm: /var/lib/rrdcached/db/pve2->

root@em3:~# pveversion
pve-manager/7.3-6/723bb6ec (running kernel: 5.15.85-1-pve)

What is happening?

Moayad · Mar 3, 2023

Hi,

Can you see the status of rrdcached service if it has an error output?

Bash:

systemctl status rrdcached

bond347 · Mar 3, 2023

Hi Moayad,

This is the output.

root@em1:~# systemctl status rrdcached
● rrdcached.service - LSB: start or stop rrdcached
Loaded: loaded (/etc/init.d/rrdcached; generated)
Active: active (running) since Fri 2023-03-03 16:55:12 +08; 1h 12min ago
Docs: man:systemd-sysv-generator(8)
Process: 1213 ExecStart=/etc/init.d/rrdcached start (code=exited, status=0/SUCCESS)
Tasks: 10 (limit: 153967)
Memory: 15.8M
CPU: 594ms
CGroup: /system.slice/rrdcached.service
└─1266 /usr/bin/rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l u>

Mar 03 16:55:12 em1 systemd[1]: Starting LSB: start or stop rrdcached...
Mar 03 16:55:12 em1 rrdcached[1213]: rrdcached started.
Mar 03 16:55:12 em1 systemd[1]: Started LSB: start or stop rrdcached.

Any suspicious?

Moayad · Mar 3, 2023

Hi,

Can you please try to restart the rrdcached and the pve-cluster services?

bond347 · Mar 6, 2023

Hi Moayad,

I rebooted 3 nodes, 1 by 1.
After the reboot, the errors still persist.

I left it for the weekend and today i checked, all are looking normal.

root@em1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-03-03 16:55:13 +08; 2 days ago
Process: 1268 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1276 (pmxcfs)
Tasks: 7 (limit: 153967)
Memory: 56.0M
CPU: 1min 55.041s
CGroup: /system.slice/pve-cluster.service
└─1276 /usr/bin/pmxcfs

Mar 06 01:27:44 em1 pmxcfs[1276]: [status] notice: received log
Mar 06 01:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 02:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 03:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 04:07:09 em1 pmxcfs[1276]: [status] notice: received log
Mar 06 04:07:11 em1 pmxcfs[1276]: [status] notice: received log
Mar 06 04:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 05:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 06:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful
Mar 06 07:55:12 em1 pmxcfs[1276]: [dcdb] notice: data verification successful

But i have questions,
1. what had happened? What was the cause?
2. Next time, what should i look for?

I'm sure, other members here should i have experienced this.

bond347 · Mar 6, 2023

Hi Moayad and members,

Following up to my previous updates.

I also saw this message "Mar 03 17:04:46 em1 corosync[1382]: [KNET ] host: host: 3 has no active links". What does it mean?

root@em1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-03-03 16:55:14 +08; 2 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1382 (corosync)
Tasks: 9 (limit: 153967)
Memory: 136.0M
CPU: 25min 15.782s
CGroup: /system.slice/corosync.service
└─1382 /usr/sbin/corosync -f

Mar 03 17:04:46 em1 corosync[1382]: [KNET ] host: host: 3 has no active links
Mar 03 17:06:44 em1 corosync[1382]: [KNET ] rx: host: 3 link: 0 is up
Mar 03 17:06:44 em1 corosync[1382]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 03 17:06:44 em1 corosync[1382]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 03 17:06:44 em1 corosync[1382]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 03 17:06:44 em1 corosync[1382]: [QUORUM] Sync members[3]: 1 2 3
Mar 03 17:06:44 em1 corosync[1382]: [QUORUM] Sync joined[1]: 3
Mar 03 17:06:44 em1 corosync[1382]: [TOTEM ] A new membership (1.5f) was formed. Members joined: 3
Mar 03 17:06:44 em1 corosync[1382]: [QUORUM] Members[3]: 1 2 3
Mar 03 17:06:44 em1 corosync[1382]: [MAIN ] Completed service synchronization, ready to provide service.

Moayad · Mar 6, 2023

bond347 said:
I also saw this message "Mar 03 17:04:46 em1 corosync[1382]: [KNET ] host: host: 3 has no active links". What does it mean?

This message means that the host 3 is lost the link for the Corosync. Do you have only one ring for the Corosync config? If yes – we recommend using a separate network only for the Corosync or/and adding a second ring to the Corosync config [0]

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy

[SOLVED] Error on systemctl status pve-cluster

bond347

Member

Moayad

Proxmox Staff Member

bond347

Member

Moayad

Proxmox Staff Member

bond347

Member

bond347

Member

Moayad

Proxmox Staff Member

We value your privacy