[SOLVED] "unable to acquire lock" / "tried creating snapshot that's already in use"

BriBriBri · Mar 8, 2024

Hi All,

Is this expected behavior?

PBS1 is our older pbs server that was removed to a remote site. Its sole function is to nightly reach out to our newer (PBS2) server and sync its datastore to itself. PBS1 then also runs its own verify jobs on the datastore it then holds.

PBS1's sync log

Code:

2024-03-08T02:34:47-08:00: sync snapshot vm/273/2024-03-08T03:21:45Z done
2024-03-08T02:34:47-08:00: percentage done: 88.33% (106/120 groups)
2024-03-08T02:34:48-08:00: skipped: 16 snapshot(s) (2023-10-28T02:10:14Z .. 2024-03-06T03:05:04Z) - older than the newest local snapshot
2024-03-08T02:34:48-08:00: percentage done: 88.75% (106/120 groups, 1/2 snapshots in group #107)
2024-03-08T02:34:48-08:00: sync group vm/274 failed - unable to acquire lock on snapshot directory "/mnt/datastore/pbs1-hdds-210tb/ns/primary-pve/vm/274/2024-03-07T03:04:46Z" - internal error - tried creating snapshot that's already in use
2024-03-08T02:34:48-08:00: skipped: 16 snapshot(s) (2023-10-28T02:11:47Z .. 2024-03-06T03:22:49Z) - older than the newest local snapshot
2024-03-08T02:34:48-08:00: re-sync snapshot vm/275/2024-03-07T03:20:31Z

PBS1's verify job

Code:

2024-03-08T02:34:32-08:00: verify group pbs1-hdds-210tb:vm/274 (15 snapshots)
2024-03-08T02:34:32-08:00: verify pbs1-hdds-210tb:vm/274/2024-03-07T03:04:46Z
2024-03-08T02:34:32-08:00:   check qemu-server.conf.blob
2024-03-08T02:34:32-08:00:   check drive-virtio0.img.fidx
2024-03-08T02:37:30-08:00:   verified 39815.68/57212.00 MiB in 178.78 seconds, speed 222.70/320.01 MiB/s (0 errors)
2024-03-08T02:37:30-08:00: percentage done: 78.18% (100/128 groups, 1/15 snapshots in group #101)

So is my understanding correct: The sync job needs to open/see the snapshot it synced the night before (2024-03-07) prior to creating the latest snapshot (2024-03-08). And this fails because the verify job is holding that pre-existing/prior night's (2024-03-07) snapshot open for verification. So syncing of the latest snapshot fails (though the rest of the sync job carries on).

I guess I'm just surprised that a sync job bringing over new snapshots is being tripped up over not being able to first open old snapshots it already brought over prior (If that is indeed what is happening here). And, mind you, I'm no genius on how all this works so my surprise is from an ignorant/layperson's perspective and it is not intended as judgement. I'm sure it all makes sense from the developer's point of view!

Best Regards,

Brian

ggoller · Mar 11, 2024

Hi!
were there any other jobs running at the same time on PBS1? Like a verify job, garbage collection or prune job?

Some context: the error you're getting is because the sync job tries to create/open a local snapshot with an exclusive lock. This fails because something other currently holds a lock of this snapshot. This can happen when there is another job running at the same time.

BriBriBri · Mar 11, 2024

Thanks. Yes, I should have been more explicit. It's clearly the verify job conflicting. I'm just surprised that a snapshot that has already been synced across the previous night (and is now being verified) is being re-opened by the current sync job prior to syncing across the newer snapshot. So I just wanted to confirm that this is indeed what is happening even if I don't quite understand why.

ggoller · Mar 12, 2024

Yes, if there is a sync job we pull in all the newer snapshots from the remote and the last local snapshot (if the same one is also present on the remote obviously). This means if we do two sync jobs one after the other without having added anything on both instances, the latest snapshot is overwritten locally.
This is done to accommodate a client-log appearing after the initial sync. Because sometime we do:
1) backup to pbs2
2) pbs1 syncs backups from pbs2
3) client-log is uploaded to pbs2

In this case, we need to also replicate the new client-log to pbs1 (this is why we pull again and overwrite the latest snapshot).

BriBriBri · Mar 12, 2024

Thank you! All is understood now!

Search

Search

[SOLVED] "unable to acquire lock" / "tried creating snapshot that's already in use"

BriBriBri

Member

ggoller

Proxmox Staff Member

BriBriBri

Member

ggoller

Proxmox Staff Member

BriBriBri

Member