Missing parts on graphs

filippoclikkami

New Member
Dec 21, 2023
28
3
3
Hi everyone, it is normale that graphs shows this gaps. Is a 4 node cluster minisforum MS-01, no issue with vms. I got instability with pbs while syncing with a remote pbs. The instability is on the pve side, i mean syncing go on but on pve side it shows connection time out several time. PBS is a 4 core and 8 gb ram VM, meanwhile in the graps attacched called cpuramnodes you can see the gaps happen on every node, also appear in network traffic Disk IO ecc. I'm thinking is some energy saving involved or cpu throttling, but vms still works fine.


Anyone with this setup or similar situation?
 

Attachments

  • pveusageonpve.png
    pveusageonpve.png
    20.1 KB · Views: 5
  • cpuramnodes.png
    cpuramnodes.png
    49.9 KB · Views: 5
The gaps you're seeing in the Proxmox graphs are usually caused by interruptions in pvestatd, the service responsible for collecting and updating resource statistics. If pvestatd can’t gather data in time due to crashes, excessively high load, I/O delays, or timeouts, you’ll see missing segments in those graphs.

You can check its status with:
Bash:
systemctl status pvestatd
journalctl -u pvestatd
 
Thanks for replies

The gaps you're seeing in the Proxmox graphs are usually caused by interruptions in pvestatd, the service responsible for collecting and updating resource statistics. If pvestatd can’t gather data in time due to crashes, excessively high load, I/O delays, or timeouts, you’ll see missing segments in those graphs.

You can check its status with:
Bash:
systemctl status pvestatd
journalctl -u pvestatd

Jul 08 18:10:03 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:10:03 clikka1 pvestatd[1195]: status update time (120.248 seconds)
Jul 08 18:12:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:12:04 clikka1 pvestatd[1195]: status update time (120.234 seconds)
Jul 08 18:14:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:14:04 clikka1 pvestatd[1195]: status update time (120.227 seconds)
Jul 08 18:16:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:16:04 clikka1 pvestatd[1195]: status update time (120.226 seconds)
Jul 08 18:16:22 clikka1 pvestatd[1195]: status update time (17.886 seconds)
Jul 08 18:18:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:18:42 clikka1 pvestatd[1195]: status update time (120.242 seconds)
Jul 08 18:20:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:20:42 clikka1 pvestatd[1195]: status update time (120.227 seconds)
Jul 08 18:22:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:22:42 clikka1 pvestatd[1195]: status update time (120.234 seconds)
Jul 08 18:24:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:24:42 clikka1 pvestatd[1195]: status update time (120.231 seconds)

This is what i get, since beginning of june is like that every day. I missed to say that datastore is an nfs share on a qnap rack nas. No problem before i started syncing remote pbs. Initial thought was on nas energy saving settings, disabling disks stop seems to had solved but after about 2 days showed again.
 
I missed to say that datastore is an nfs share on a qnap rack nas
So if I understand correctly; you've got a mounted NFS share as a Datastore on the PBS VM on PVE (same node?) which mounts to an NFS server (BM QNAP). This is probably going to cause a lot of strain/load on that node/NW. In general, NFS as a datastore for PBS (even BM) should probably be avoided. See these findings.

I don't use PBS, but have you got Verify new backups immediately after completion turned on/off?
 
I have an NFS share mounted as datastore on PBS vm. PBS vm is on internal disk of PVE node.
Apart from what I've said above about a mounted dataset on NFS , if your PBS VM becomes inoperable, how will you recover? I guess at a minimum you should have a separate independent backup (non-PBS) of that PBS VM (preferably on some external media).

Yes, everyday there's a job, sometimes i verify manually and works too.
Try turning off that Verify new backups immediately after completion & compare to see if those logs go away (which is what I initially meant).
 
  • Like
Reactions: filippoclikkami
Apart from what I've said above about a mounted dataset on NFS , if your PBS VM becomes inoperable, how will you recover? I guess at a minimum you should have a separate independent backup (non-PBS) of that PBS VM (preferably on some external media).
Sorry, maybe I didn't explain myself well. PBS vm don't become inoperable, only datastore fails to communicate, in any case yeah, i have an independent backup of it.

Try turning off that Verify new backups immediately after completion & compare to see if those logs go away (which is what I initially meant).
Ok i try to turning off immediate verify and configure a scheduled job. Just to say, now tha i've rebooted nas and after pbs vm, seems to work fine, i'll check in next days.

Thanks
 
I can say with a 99% accuracy that is verification job that cause instability. I've scheduled at 6:30 AM and from this time communications failure occurs
 
PBS vm don't become inoperable, only datastore fails to communicate
Ok, so any backups relying on that datastore are inoperable - same result.

i have an independent backup of it.
Good for you.

I can say with a 99% accuracy that is verification job that cause instability.
If I understand you correctly, you are confirming that turning off that setting causes the issue to be gone. So leave it off. This is inevitably caused by the mounted NFS datastore, as I originally suspected.

I've scheduled at 6:30 AM and from this time communications failure occurs
So it seems the verification job causes the issue on its own (assuming no other backup job or other NFS activity is going at that time).

Not sure how you can effectively run verify jobs in your current environment. You will probably have to see to making a change in your setup. I already linked above the "findings" on NFS mounted datastores, & I quote:
avoid nfs and samba like the plague

However browsing those findings, I see another line:
it is ok to have your PBS installed as VM and put the virtual datastore disk (the .qcow2 file in Proxmox) on nfs

So maybe this is an avenue you could explore.