Experiencing a lot of failed sync on remote PBS

Suertzz

Member
Jan 4, 2021
13
3
8
Hi,

Im running 2 PBS server and I got a lot of failed sync : (almost 50% of the sync fail :()

1657618699541.png
Both server running on `2.2-1`

Server B is configured to pull backup 2x every day from the Server A

Server A verify job :
Code:
2022-07-12T02:38:09+02:00: Automatically verifying newly added snapshot
2022-07-12T02:38:09+02:00: verify DATASTORE:vm/host/2022-07-12T00:00:02Z
2022-07-12T02:38:09+02:00:   check archv.pxar.didx

Sync Job log on server B:
Code:
[..]
2022-07-12T11:27:17+02:00: sync group vm/host failed - unable to acquire lock on snapshot directory "/tank/pbs/datastore/vm/host/2022-07-12T00:00:02Z" - locked by another operation
[..]
2022-07-12T11:28:19+02:00: percentage done: 100.00% (92/92 groups)
2022-07-12T11:28:19+02:00: Finished syncing namespace , current progress: 91 groups, 1 snapshots
2022-07-12T11:28:19+02:00: TASK ERROR: sync failed with some errors.

Since the datastore is pretty big, Server A take most of the time running verify job (datastore on HDD with special device on NVME), when server B start a remote sync job, 90% of the time there is a verify job running on Server A, so the sync will likely fail.
Is there any solution to prevent that ?
Sadly I cannot predict when verify job will finish and changing the schedule of the sync job do not seems to be a solution to me. Does a config to allow sync job to retry reading the snapshot exist?

Also why does a lock prevent a snapshot to be read from the remote server ?

Thank you in advance for your help!
 
The lock is there so that there aren't multiple writes or a write and one or more reads concurrently.
How often do you verify your snapshots?
 
Hi mira,

How often do you verify your snapshots?

Every day at 10 am (skip verified, and re-verify after 8 day) on each datastore (got 2) but they can take a lot of time since the datastore is slow

The lock is there so that there aren't multiple writes or a write and one or more reads concurrently.
Oh ok! I thought it applied just to the write (when a verify task apply a lock can't you allow sync to read and vice versa ? I think there is no change on the data itself )
 
If the verification job fails to verify blocks, those will be moved. This means a verification job requires write access.
And as soon as there's write access, reads might become invalid at any time. For consistency it is important that those can never be done at the same from different processes.

Could you try verifying only after 2 or 4 weeks or so instead? Especially when verification jobs require that much time?
Usually bitrot isn't that bad to require constant re-verification. You are probably fine with doing it once a month still. But that's of course up to you to decide.
 
If the verification job fails to verify blocks, those will be moved. This means a verification job requires write access.
And as soon as there's write access, reads might become invalid at any time. For consistency it is important that those can never be done at the same from different processes.
Thanks a lot for the explanation!


Could you try verifying only after 2 or 4 weeks or so instead? Especially when verification jobs require that much time?
Usually bitrot isn't that bad to require constant re-verification. You are probably fine with doing it once a month still. But that's of course up to you to decide.
Ofc I can delay a bit the verification job, But it is important to me that the datastore get sync at least once a day

Im afraid that will just delay the issue, if I delay the verification job, it will take longer (because more backup since last check) and when the sync job will run, it will fail because a verify job is running :(


It would be nice to have a feature to ignore locked snapshot when remote sync run (or to have the sync job retry later on that locked snap)
Thanks for the help!
 
I meant the re-verification time, not the job itself.
If you only re-verify backups that haven't been verified in 28 days (4 weeks), it will lead to less overall load.
 
  • Like
Reactions: Suertzz