internal error - tried creating snapshot that's already in use

DerDanilo

Renowned Member
Jan 21, 2017
477
133
83
Code:
sync group vm/90002 failed - unable to acquire lock on snapshot directory "/opt/backup/pbs/vm/90002/2023-05-19T21:13:26Z" - internal error - tried creating snapshot that's already in use

We have some issues with a larger backup pool that we cannot solve at the moment. No namespaces in use.
The backup data is fine on the main backup server. But during sync on the secondary (offsite with 1Gbit/s) backup server the backup group seems to have issues again after a while. The problem appears again after a while.
Also deleting the backup group on the main and backup server, creating clean backups doesn't seem to help.
The secondary backup server has 14x 16TB HDDs with ZFS (raidz2) accelerated with DC NVMEs for logs and cache. Sadly there was not enough space for special devices.

Any ideas?
 
I have a similar issue. I have two PBS servers running on different locations. They both sync the same backup to each other. At first I had it hourly, but a lot of times this issue came up. So what I did was set one to sync at :15 and the other at :30. The one that syncs at :30 does not seem to have any more issues, while the one that syncs at :15 still does get this error, although a lot less it seems.

This needs to be fixed! Is this to do with both servers syncing the same backup to each other, is this not supposed to be done? The only reason I have this set up like this is so I can backup on both locations. VMs on location 1 get a backup on PBS1, and it's then synced over to PBS2. And same for the other location. VMs on location 2 get a backup on PBS2, and it's then synced over to PBS1.

EDIT: It just came to my mind. Syncing means not only upload, but also download, right? So when PBS1 syncs with PBS2, is it both sending and downloading backups? As in, it keeps both in sync? So I only need to have a sync job on one of the PBS servers, right?

After reading the docs, it clearly states that it is syncing from a remote to a local storage. So I need to have a sync job on both of them to keep them both in sync. Would it be better to have a datastore for each PBS, and sync that one? I also probably need to do a verify job on the synced datastore on the remote as well, right? I'm not sure what the error could be, it's being locked by something....
 
Last edited:
Same issue in our set up. We have a main backup server and a offsite backup. Sometimes this error message appears for different hosts.

Log from the remote backup server:
Code:
sync group host/hostname failed - unable to acquire lock on snapshot directory "/backup/backup/ns/h/host/hostname/2024-02-02T23:27:14Z" - internal error - tried creating snapshot that's already in use

I found out that on the main backup server another error message appears at the same time regarding the snapshot to sync:
Code:
starting new backup reader datastore 'backup': "/backup/backup"
protocol upgrade done
TASK ERROR: connection error: connection reset

As far as I could see there was no other job running using this snapshot.

Version on both servers: 3.1-2

Is there a solution in sight? Maybe a retry of the read operation if it fails the first time?
 
were two sync jobs running at that point in time?
 
that seems very strange. could you check for other concurrent tasks on the PBS side?
 
Hello
Did anybody found why / when this issue happened ? I'm having the same problem
 
@Rico29

please provide details:
- proxmox-backup-manager versions on both sides
- sync job settings
- task log of the sync job
- system log (journalctl --since .. --until ...) on both ends covering the sync job
- task logs of any PBS tasks running at the same time as the sync job

thanks!
 
Hello
sync ran from node pbs-hits, to sync DS on pbs-pa2
both nodes are in version
Code:
proxmox-backup-server 3.1.4-1 running version: 3.1.4


Code:
root@pbs-hits:~# proxmox-backup-manager sync-job list
┌─────────────────┬─────────┬─────────┬──────────────┬──────────┬──────────────┬─────────┬─────────┐
│ id              │ store   │ remote  │ remote-store │ schedule │ group-filter │ rate-in │ comment │
╞═════════════════╪═════════╪═════════╪══════════════╪══════════╪══════════════╪═════════╪═════════╡
│ s-b3d24967-393e │ pbs-pa2 │ pbs-pa2 │ pbs-pa2      │ 02:00    │ all          │ 80 MiB  │         │
└─────────────────┴─────────┴─────────┴──────────────┴──────────┴──────────────┴─────────┴─────────┘

no other tasks running on both nodes
the only thing in journal is :

Mar 19 07:50:33 pbs-hits.priv.celya.fr proxmox-backup-[5801]: pbs-hits.priv.celya.fr proxmox-backup-proxy[5801]: write rrd data back to disk
Mar 19 07:50:33 pbs-hits.priv.celya.fr proxmox-backup-[5801]: pbs-hits.priv.celya.fr proxmox-backup-proxy[5801]: starting rrd data sync
Mar 19 07:50:33 pbs-hits.priv.celya.fr proxmox-backup-[5801]: pbs-hits.priv.celya.fr proxmox-backup-proxy[5801]: rrd journal successfully committed (41 files in 0.024 seconds)


task logfile attached, error is line 265

Regards
Cédric
 

Attachments

  • task-pbs-hits-syncjob-2024-03-19T06_40_38Z.log
    52.1 KB · Views: 2
could you then check for tasks running on both ends between "2024-03-18T21:22:12Z" and "2024-03-19T07:40:58+01:00" (note the different timezones please!)
 
Code:
...Mar 19 02:14:00 pbs-pa2 proxmox-backup-proxy[1224]: reader finished successfully
Mar 19 02:14:00 pbs-pa2 proxmox-backup-proxy[1224]: TASK OK
Mar 19 02:14:01 pbs-pa2 CRON[491905]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: starting new backup reader datastore 'pbs-pa2': "/mnt/BACKUP/pbs-pa2"
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: protocol upgrade done
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: TASK ERROR: connection error: connection reset
Mar 19 02:14:01 pbs-pa2 CRON[491905]: pam_unix(cron:session): session closed for user root
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: starting new backup reader datastore 'pbs-pa2': "/mnt/BACKUP/pbs-pa2"
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: protocol upgrade done
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: GET /download
Mar 19 02:14:01 pbs-pa2 proxmox-backup-proxy[1224]: download "/mnt/BACKUP/pbs-pa2/ns/pxmx-pa2/vm/126/2024-01-15T00:04:53Z/index.json.blob"
Mar 19 02:14:06 pbs-pa2 proxmox-backup-proxy[1224]: starting new backup reader datastore 'pbs-pa2': "/mnt/BACKUP/pbs-pa2"
...


what can cause this "TASK ERROR: connection error: connection reset" ?
 
please check for running tasks for that time period (it's not the hosts, but the timestamps that have different timezones!).
 
there's no other PBS running tasks for this period (I've scheduled all tasks to not overlap because I'm using spinning disks and iops are not very good)
 
there must be - no task would mean nobody can hold the lock.. and some tasks are not scheduled at all (restores for example). please check the task list, not the schedules.
 
you check the task list and the start and end times, there is no filter for that. but you can likely script it if you wanted to:

Code:
proxmox-backup-manager task list --all --output-format json-pretty
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!