Missing parts on graphs

filippoclikkami · Jul 2, 2025

Hi everyone, it is normale that graphs shows this gaps. Is a 4 node cluster minisforum MS-01, no issue with vms. I got instability with pbs while syncing with a remote pbs. The instability is on the pve side, i mean syncing go on but on pve side it shows connection time out several time. PBS is a 4 core and 8 gb ram VM, meanwhile in the graps attacched called cpuramnodes you can see the gaps happen on every node, also appear in network traffic Disk IO ecc. I'm thinking is some energy saving involved or cpu throttling, but vms still works fine.

Anyone with this setup or similar situation?

groque · Jul 7, 2025

The gaps you're seeing in the Proxmox graphs are usually caused by interruptions in pvestatd, the service responsible for collecting and updating resource statistics. If pvestatd can’t gather data in time due to crashes, excessively high load, I/O delays, or timeouts, you’ll see missing segments in those graphs.

You can check its status with:

Bash:

systemctl status pvestatd
journalctl -u pvestatd

gfngfn256 · Jul 7, 2025

filippoclikkami said:
4 node cluster

You should add another node (or QDevice) to ensure quorum.

At a minimum given your instability - this is going to upset the cluster. This may even be contributing to the instability.

filippoclikkami · Jul 8, 2025

Thanks for replies

groque said:
The gaps you're seeing in the Proxmox graphs are usually caused by interruptions in pvestatd, the service responsible for collecting and updating resource statistics. If pvestatd can’t gather data in time due to crashes, excessively high load, I/O delays, or timeouts, you’ll see missing segments in those graphs.

You can check its status with:

Bash:

systemctl status pvestatd journalctl -u pvestatd

Jul 08 18:10:03 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:10:03 clikka1 pvestatd[1195]: status update time (120.248 seconds)
Jul 08 18:12:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:12:04 clikka1 pvestatd[1195]: status update time (120.234 seconds)
Jul 08 18:14:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:14:04 clikka1 pvestatd[1195]: status update time (120.227 seconds)
Jul 08 18:16:04 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:16:04 clikka1 pvestatd[1195]: status update time (120.226 seconds)
Jul 08 18:16:22 clikka1 pvestatd[1195]: status update time (17.886 seconds)
Jul 08 18:18:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:18:42 clikka1 pvestatd[1195]: status update time (120.242 seconds)
Jul 08 18:20:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:20:42 clikka1 pvestatd[1195]: status update time (120.227 seconds)
Jul 08 18:22:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:22:42 clikka1 pvestatd[1195]: status update time (120.234 seconds)
Jul 08 18:24:42 clikka1 pvestatd[1195]: proxmox-backup-client failed: Error: http request timed out
Jul 08 18:24:42 clikka1 pvestatd[1195]: status update time (120.231 seconds)

This is what i get, since beginning of june is like that every day. I missed to say that datastore is an nfs share on a qnap rack nas. No problem before i started syncing remote pbs. Initial thought was on nas energy saving settings, disabling disks stop seems to had solved but after about 2 days showed again.

gfngfn256 · Jul 8, 2025

filippoclikkami said:
I missed to say that datastore is an nfs share on a qnap rack nas

So if I understand correctly; you've got a mounted NFS share as a Datastore on the PBS VM on PVE (same node?) which mounts to an NFS server (BM QNAP). This is probably going to cause a lot of strain/load on that node/NW. In general, NFS as a datastore for PBS (even BM) should probably be avoided. See these findings.

I don't use PBS, but have you got Verify new backups immediately after completion turned on/off?

filippoclikkami · Monday at 10:32

I have an NFS share mounted as datastore on PBS vm. PBS vm is on internal disk of PVE node.

gfngfn256 said:
I don't use PBS, but have you got Verify new backups immediately after completion turned on/off?

Yes, everyday there's a job, sometimes i verify manually and works too.

gfngfn256 · Monday at 13:24

filippoclikkami said:
I have an NFS share mounted as datastore on PBS vm. PBS vm is on internal disk of PVE node.

Apart from what I've said above about a mounted dataset on NFS , if your PBS VM becomes inoperable, how will you recover? I guess at a minimum you should have a separate independent backup (non-PBS) of that PBS VM (preferably on some external media).

filippoclikkami said:
Yes, everyday there's a job, sometimes i verify manually and works too.

Try turning off that Verify new backups immediately after completion & compare to see if those logs go away (which is what I initially meant).

filippoclikkami · Monday at 16:26

gfngfn256 said:
Apart from what I've said above about a mounted dataset on NFS , if your PBS VM becomes inoperable, how will you recover? I guess at a minimum you should have a separate independent backup (non-PBS) of that PBS VM (preferably on some external media).

Sorry, maybe I didn't explain myself well. PBS vm don't become inoperable, only datastore fails to communicate, in any case yeah, i have an independent backup of it.

gfngfn256 said:
Try turning off that Verify new backups immediately after completion & compare to see if those logs go away (which is what I initially meant).

Ok i try to turning off immediate verify and configure a scheduled job. Just to say, now tha i've rebooted nas and after pbs vm, seems to work fine, i'll check in next days.

Thanks

filippoclikkami · Tuesday at 12:15

I can say with a 99% accuracy that is verification job that cause instability. I've scheduled at 6:30 AM and from this time communications failure occurs

gfngfn256 · Tuesday at 12:50

filippoclikkami said:
PBS vm don't become inoperable, only datastore fails to communicate

Ok, so any backups relying on that datastore are inoperable - same result.

filippoclikkami said:
i have an independent backup of it.

Good for you.

filippoclikkami said:
I can say with a 99% accuracy that is verification job that cause instability.

If I understand you correctly, you are confirming that turning off that setting causes the issue to be gone. So leave it off. This is inevitably caused by the mounted NFS datastore, as I originally suspected.

filippoclikkami said:
I've scheduled at 6:30 AM and from this time communications failure occurs

So it seems the verification job causes the issue on its own (assuming no other backup job or other NFS activity is going at that time).

Not sure how you can effectively run verify jobs in your current environment. You will probably have to see to making a change in your setup. I already linked above the "findings" on NFS mounted datastores, & I quote:

avoid nfs and samba like the plague

However browsing those findings, I see another line:

it is ok to have your PBS installed as VM and put the virtual datastore disk (the .qcow2 file in Proxmox) on nfs

So maybe this is an avenue you could explore.

Search

Search

Missing parts on graphs

filippoclikkami

New Member

Attachments

groque

Member

gfngfn256

Distinguished Member

filippoclikkami

New Member

gfngfn256

Distinguished Member

filippoclikkami

New Member

gfngfn256

Distinguished Member

filippoclikkami

New Member

filippoclikkami

New Member

gfngfn256

Distinguished Member

We value your privacy