i have some problem that occurs multiple times,
after some time (i dont know exactly minutes\seconds )
on the hyperv host all mounts are back to stable,
but on a VM some of the mounts which are ceph based are not recovered automatically
proxmox 8.0.4
ceph 17.2.6
most of the data is on a dedication fast network, this error occurred on a single node only this time, the cluster is stable and usable
how i can further investigate to prevent the issue ?
more logs that might be helpful i found asked on questions like this:
dmseg is clean on host and on vm
systemctl status pve-cluster
systemctl status rrdcached
Code:
Sep 26 13:51:19 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/cephfs-data: -1
Sep 26 13:51:19 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/local: -1
Sep 26 13:51:19 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/cephfs-shared: -1
Sep 26 13:51:19 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/nfs_home: -1
...
few more lines exactly the same with more paths
after some time (i dont know exactly minutes\seconds )
on the hyperv host all mounts are back to stable,
but on a VM some of the mounts which are ceph based are not recovered automatically
proxmox 8.0.4
ceph 17.2.6
most of the data is on a dedication fast network, this error occurred on a single node only this time, the cluster is stable and usable
how i can further investigate to prevent the issue ?
more logs that might be helpful i found asked on questions like this:
dmseg is clean on host and on vm
systemctl status pve-cluster
Code:
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Thu 2023-09-21 16:44:37 IDT; 4 days ago
Main PID: 2455 (pmxcfs)
Tasks: 10 (limit: 618960)
Memory: 64.2M
CPU: 1h 20.566s
CGroup: /system.slice/pve-cluster.service
└─2455 /usr/bin/pmxcfs
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/ce>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/co>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/sc>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/vq>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/ce>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/ce>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/nf>
Sep 26 13:58:20 pve-blade-212 pmxcfs[2455]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve-blade-211/lo>
Sep 26 13:58:30 pve-blade-212 pmxcfs[2455]: [status] notice: received log
Sep 26 14:04:18 pve-blade-212 pmxcfs[2455]: [status] notice: received log
systemctl status rrdcached
Code:
● rrdcached.service - LSB: start or stop rrdcached
Loaded: loaded (/etc/init.d/rrdcached; generated)
Active: active (running) since Thu 2023-09-21 16:44:36 IDT; 4 days ago
Docs: man:systemd-sysv-generator(8)
Tasks: 10 (limit: 618960)
Memory: 21.2M
CPU: 13min 10.639s
CGroup: /system.slice/rrdcached.service
└─2430 /usr/bin/rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l un>
Sep 21 16:44:36 pve-blade-212 systemd[1]: Starting rrdcached.service - LSB: start or stop rrdcached...
Sep 21 16:44:36 pve-blade-212 rrdcached[2365]: rrdcached started.
Sep 21 16:44:36 pve-blade-212 systemd[1]: Started rrdcached.service - LSB: start or stop rrdcached.
Last edited: