[SOLVED] Sync Job fails for single VM (out of a few)

lug-pm

New Member
Jul 31, 2023
4
0
1
I have the following setup, two PBS servers, one is connected to my Proxmox cluster, the other serves as a sync target (backup copy).

My backup job is also relatively simple, every day, all VMs. Likewise the sync job, every day, all VMs. In both PBSs, everything takes place in the root namespace.
The backups belong on the main PBS “backup@pbs”, on the sync target root@pam, for the sync itself I use the user “sync@pbs”. I have given this user the permissions “DatastoreBackup” and “DatastoreReader” on the main PBS. I don't think this is the cause (as it works for all other VMs), but I thought I'd mention it anyway.

The backup works without errors, but the sync job fails for a single VM, it did not work for this VM from the start.

The sync log contains the following error:

Code:
sync group vm/172 failed - group lock failed: parsing owner for vm/172 failed: not a valid user id
 
I have the following setup, two PBS servers, one is connected to my Proxmox cluster, the other serves as a sync target (backup copy).

My backup job is also relatively simple, every day, all VMs. Likewise the sync job, every day, all VMs. In both PBSs, everything takes place in the root namespace.
The backups belong on the main PBS “backup@pbs”, on the sync target root@pam, for the sync itself I use the user “sync@pbs”. I have given this user the permissions “DatastoreBackup” and “DatastoreReader” on the main PBS. I don't think this is the cause (as it works for all other VMs), but I thought I'd mention it anyway.

The backup works without errors, but the sync job fails for a single VM, it did not work for this VM from the start.

The sync log contains the following error:

Code:
sync group vm/172 failed - group lock failed: parsing owner for vm/172 failed: not a valid user id
Check the content of the owner file located in the corresponding groups folder on the datastore, it seems like you accidentally have written an incorrect user in there? I guess it should be owned by root@pam according to your explanation? You will have to adapt the content to the correct user for the sync job to work for this group as well.
 
Check the content of the owner file located in the corresponding groups folder on the datastore, it seems like you accidentally have written an incorrect user in there? I guess it should be owned by root@pam according to your explanation? You will have to adapt the content to the correct user for the sync job to work for this group as well.
I never edited any permissions or users in the backup. All users have been created before the first backup was taken.

This is how it looks like on the main/source pbs:

1730883686979.png

and this is how it looks on the sync target pbs:
1730883771853.png
 
I never edited any permissions or users in the backup
Well something or someone did ;). Check the content of cat <path-to-your-datastore>/vm/172/owner on the sync target PBS instance. I guess that it does not contain the plain root@pam as expected.
 
  • Like
Reactions: lug-pm
Well something or someone did ;). Check the content of cat <path-to-your-datastore>/vm/172/owner on the sync target PBS instance. I guess that it does not contain the plain root@pam as expected.
In fact, the 172/owner file does not contain root@pam, as it is completely empty. It was created on 07.09.2024, when I look at the log from that day, I see the following:

2024-09-07T00:00:15+02:00: sync group vm/172 failed - group lock failed: unable to write owner file "/mnt/datastore/RAID6-10TB/vm/172/owner" - No space left on device (os error 28)

Back then I had no garbage collection on the sync target, so there storage was full. What is of course annoying is that the file was created but was not deleted after the error. It's also a pretty unlikely error, I don't know how useful it is to make an adjustment in the PBS source code for this.

I have now entered root@pam manually in the owner file, the sync is now running.

Thank you very much for your support, I didn't know about the owner file until now :)
 
2024-09-07T00:00:15+02:00: sync group vm/172 failed - group lock failed: unable to write owner file "/mnt/datastore/RAID6-10TB/vm/172/owner" - No space left on device (os error 28)
Okay, that does indeed explain what happened.

Back then I had no garbage collection on the sync target, so there storage was full. What is of course annoying is that the file was created but was not deleted after the error. It's also a pretty unlikely error, I don't know how useful it is to make an adjustment in the PBS source code for this.
Cleaning up the owner file in case the disk ran out of space might not be the best approach, after all it might already contain a valid user and just the update to the file fails (e.g. when changing ownership)... Although I do not know the exact code paths involved from the top of my head. Also, it might not even work as copy on write filesystems require also storage space to free up space.

But feel free to open an issue regarding this in our bugtracker, referencing this forum thread, than we can evaluate if this could be improved. Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!