PBS Backup Client - General syntax/usage issues

boomam · Apr 27, 2023

Hi,
Is there some special-sauce to getting the PBS Client working?
I've been following the docs (which need updating btw...) and a few youtube videos to try and work out what is going on, but it just seems....intermittant.

What works one minute, a few mins later doesn't work and either complains about permission errors, or the "$XDG_RUNTIME_DIR" error.

For the setup I'm attempting -

PBS Server

I have setup on PBS -

A backup user
Added an API key

Set the permissions on PBS for both the user and key to be applied to '/', and with the role of 'Datastore Admin' - for the sole reason that that role includes both backup related permissions and a few others - the docs on what is needed for the client to function appear to be missing, so its kind of guess work, working down from full admin.

Client

The client is a fresh Ubuntu Server VM, 23.04, hosted on Unraid, passing through a backup share via Virtio-9P.

Variables for PBS_REPOSITORY, PBS_FINGERPRINT & PBS_PASSWORD have been set in /etc/environment, and verified via set | grep PBS.

Settings used in variables -

PBS_REPOSITORY = server name (DNS not IP)
PBS_FINGERPRINT = Fingerprint of server
PBS_PASSWORD = API key of backup user.

For the scripts I'm using to backup different data sets, I have them following this format -
proxmox-backup-client backup <name_of_archive>.pxar:<location_of_data_to_backup> --skip-lost-and-found true
The first time I ran this, it works, but wanted the root@pbs password.

Further changes to syntax and whether im using the API key or user password have been attempted, similar results.

What obvious thing am i missing here?

Thanks!

Chris · Apr 27, 2023

Hi,
did you follow the repository scheme including username, tokenname, hostname and datastore as given in https://pbs.proxmox.com/docs/backup-client.html#backup-repository-locations

An example would be:

Bash:

export PBS_REPOSITORY="root@pam!<apitokenname>@<domainname>:<datastorename>"
export PBS_PASSWORD="<api-token-secret>"

proxmox-backup-client backup root.pxar:/

boomam · Apr 27, 2023

Chris said:
Hi,
did you follow the repository scheme including username, tokenname, hostname and datastore as given in https://pbs.proxmox.com/docs/backup-client.html#backup-repository-locations

An example would be:

Bash:

export PBS_REPOSITORY="root@pam!<apitokenname>@<domainname>:<datastorename>" export PBS_PASSWORD="<api-token-secret>" proxmox-backup-client backup root.pxar:/

Hi,
Yes, i attempted that syntax too.
Same issue, it'll work some of the time, other times get permission errors, or XDG errors.
I can retest again in about an hour though (just out right now).

Do you know, what permission/roles need to be applied to the user/api-key, and to what?
Lets get that 100% resolved too, so we can take that out of the equation.

Cheers

Chris · Apr 27, 2023

boomam said:
Hi,
Yes, i attempted that syntax too.
Same issue, it'll work some of the time, other times get permission errors, or XDG errors.
I can retest again in about an hour though (just out right now).

Do you know, what permission/roles need to be applied to the user/api-key, and to what?
Lets get that 100% resolved too, so we can take that out of the equation.

Cheers

Well, that depends on what the user/token should be allowed to do, there roles and permissions are documented here https://pbs.proxmox.com/docs/user-management.html#privileges

boomam · Apr 27, 2023

Chris said:
Well, that depends on what the user/token should be allowed to do, there roles and permissions are documented here https://pbs.proxmox.com/docs/user-management.html#privileges

Thanks - but I wondered what explicitly the backup client needed of a user to be able to perform all its functions - we can take an educated guess and guess one of those with 'backup' in the name, but I was hoping there would be a doc somewhere that said something akin to 'to use the backup client with a non-root user, the minimum set of roles is is X'.
I'll just assume the educated guess for now.

I'll re-test the syntax suggested above, shortly.

boomam · Apr 27, 2023

ok, retested, this time rolling up the exports and the backup command into its own *.sh -
I get the error Error: authentication failed - user account or token disabled or expired.
The token and user account are both enabled and not expired, and i have tested with the role permissions set as datastore.admin on the root path (/), and as full admin too.

boomam · Apr 27, 2023

Some further testing, power cycling the PBS server itself seems to let the backup jobs i have configured, work.
However, some of them fail with Error downloading .didx from previous manifest: Unable to open dynamic index.
Wiping the backup out in PBS, purging and re-running doesn't appear to solve the error, BUT it appears to transfer data regardless...but not enough.
As an example, one dataset is 17Gb. It completed in less than 5minutes - way too fast considering the speed of the connection...

To ask the question - I'm setting up different jobs for different data sets on the same system (on different schedules), would we expect any issue with this methodology?

boomam · Apr 28, 2023

Some further testing, with usage of --backup-id -
Fresh backup of a 14Gb set of tar.gz files -

Identified for Backup 10.523GiB

Transferred 3.357GiB

Compressed 3.557GiB

Re-used 4.026GiB

Took 123.66 seconds.
That's either some crazy compression of already compressed files, or something is amiss.

Chris · Apr 28, 2023

boomam said:
Thanks - but I wondered what explicitly the backup client needed of a user to be able to perform all its functions - we can take an educated guess and guess one of those with 'backup' in the name, but I was hoping there would be a doc somewhere that said something akin to 'to use the backup client with a non-root user, the minimum set of roles is is X'.
I'll just assume the educated guess for now.

I'll re-test the syntax suggested above, shortly.

Have you looked at the roles [0], DatastorePowerUser or DatastoreBackup is probably what you want, depending on your use case.

boomam said:
ok, retested, this time rolling up the exports and the backup command into its own *.sh -
I get the error Error: authentication failed - user account or token disabled or expired.
The token and user account are both enabled and not expired, and i have tested with the role permissions set as datastore.admin on the root path (/), and as full admin too.

Did you give the user/token the right permission on the datastore as well? I assume this is already fixed as in your later posts it seems you can connect and backup just as required. Please mention progress if you would like for others to follow and help.

boomam said:
Some further testing, power cycling the PBS server itself seems to let the backup jobs i have configured, work.
However, some of them fail with Error downloading .didx from previous manifest: Unable to open dynamic index.
Wiping the backup out in PBS, purging and re-running doesn't appear to solve the error, BUT it appears to transfer data regardless...but not enough.
As an example, one dataset is 17Gb. It completed in less than 5minutes - way too fast considering the speed of the connection...

To ask the question - I'm setting up different jobs for different data sets on the same system (on different schedules), would we expect any issue with this methodology?

So the missing dynamic index is probably a leftover from a failed job? Cannot really tell from the information at hand.

Regarding speed: Please not that PBS uses a chunk store referenced by index files. Meaning during file level backup, the client reads and serializes the whole filesystem tree to backup and splits this serialized stream into chunks of variable size referenced by their hash value (therefore dynamic chunk index). Only those chunks not already present on the server have to be uploaded, the others are already known an can be skipped during upload. This allows for reduced network io and efficient deduplication.

Regarding your question: No, it is totally fine to run different backup jobs for different datasets to the same datastore. If not encrypted, this will also improve your deduplication if the systems share similarities, as the chunks can be reused.

boomam said:
Some further testing, with usage of --backup-id -
Fresh backup of a 14Gb set of tar.gz files -
Identified for Backup 10.523GiBTransferred 3.357GiBCompressed 3.557GiBRe-used 4.026GiBTook 123.66 seconds.
That's either some crazy compression of already compressed files, or something is amiss.

I assume the tar archive has already been backed up before and only parts of it had to be uploaded, as the rest ~4GiB where already as chunks present in the chunkstore. That is the deduplication and network IO advantage i was talking about.

[0] https://pbs.proxmox.com/docs/user-management.html#access-roles

boomam · Apr 28, 2023

Chris said:
Have you looked at the roles [0], DatastorePowerUser or DatastoreBackup is probably what you want, depending on your use case.

Did you give the user/token the right permission on the datastore as well? I assume this is already fixed as in your later posts it seems you can connect and backup just as required. Please mention progress if you would like for others to follow and help.

So the missing dynamic index is probably a leftover from a failed job? Cannot really tell from the information at hand.

Regarding speed: Please not that PBS uses a chunk store referenced by index files. Meaning during file level backup, the client reads and serializes the whole filesystem tree to backup and splits this serialized stream into chunks of variable size referenced by their hash value (therefore dynamic chunk index). Only those chunks not already present on the server have to be uploaded, the others are already known an can be skipped during upload. This allows for reduced network io and efficient deduplication.

Regarding your question: No, it is totally fine to run different backup jobs for different datasets to the same datastore. If not encrypted, this will also improve your deduplication if the systems share similarities, as the chunks can be reused.

I assume the tar archive has already been backed up before and only parts of it had to be uploaded, as the rest ~4GiB where already as chunks present in the chunkstore. That is the deduplication and network IO advantage i was talking about.

[0] https://pbs.proxmox.com/docs/user-management.html#access-roles

Morning,
RE: Token/Rights - Yes, the only thing that changed from not working to working, was a reboot of PBS Server.

RE: Dynamic Index - Possibly, but with no data existing in the store, i assume it is being cached elseware?

RE: Speed (& tar Archive) - Yup, most backup/transfer products use chunking now-a-days, but the concern is that there has been zero data of a similar type ever uploaded to PBS at any point, so where is it thinking its pulling pre-existing data from?

I do want to do a re-compression test directly on the backup client, just to see if its compression of existing compressed tar.gz files matches roughly what PBS reported, as that would account for a few Gb of the test, but the other 'reused' data, that's what is confusing me.

RE: Backup jobs
So different jobs, on different sets of files, from the same host, wont cause issues?
The reason I am doing this is that unless you know of something I've missed, I dont see an 'include' option, only a 'exclude' option, and i dont want a root backup, just specific directories.

One advantage here, is that i have 2x PBSs, an on-site, and a remote install (the one im testing with here), so i can likely do a similar test to the on-site one, too.

Chris · Apr 28, 2023

boomam said:
RE: Dynamic Index - Possibly, but with no data existing in the store, i assume it is being cached elseware?

RE: Speed (& tar Archive) - Yup, most backup/transfer products use chunking now-a-days, but the concern is that there has been zero data of a similar type ever uploaded to PBS at any point, so where is it thinking its pulling pre-existing data from?

Note that pruning will not delete chunks, so your chunks are probably still in the datastore before upload, see [0].

boomam said:
So different jobs, on different sets of files, from the same host, wont cause issues?

You can also include multiple archives in one backup job, e.g.

Bash:

proxmox-backup-client backup etc.pxar:/etc home.pxar:/home ...

[0] https://pbs.proxmox.com/docs/maintenance.html#garbage-collection

boomam · Apr 28, 2023

Chris said:
Note that pruning will not delete chunks, so your chunks are probably still in the datastore before upload, see [0].

You can also include multiple archives in one backup job, e.g.

Bash:

proxmox-backup-client backup etc.pxar:/etc home.pxar:/home ...

[0] https://pbs.proxmox.com/docs/maintenance.html#garbage-collection

That's interesting on the multiple jobs in one backup, I may try that one out.

RE: Garbage Collection - yup, i read that, was running the GC between tests/after deletions to be sure.

I'll do some more testing this morning to test sizes/compressions/transfers and report back.

I'd really like to get PBS (clients) working fully, as it near flawless (touch wood...) for my VMs/LXEs from PVE, very useful.
But i have some non-PVE clients with data I'd like to centralize, so if I can get PBS working for those too, it'll save me having a hodge-podge of different approaches and a nice GUI to go along with it.

Thanks for your continued input btw, much appreciated.

Chris · Apr 28, 2023

boomam said:
RE: Garbage Collection - yup, i read that, was running the GC between tests/after deletions to be sure.

Note that only chunks older than a time barrier will actually get deleted after being marked as unused, form the docs:

"Phase two: Sweep The task iterates over all chunks, checks their file access time, and if it is older than the cutoff time (i.e., the time when GC started, plus some headroom for safety and Linux file system behavior), the task knows that the chunk was neither referred to in any backup index nor part of any currently running backup that has no index to scan for. As such, the chunk can be safely deleted."

So maybe their atime was not yet old enough for the chunks to be cleaned up. Checking the storage usage might give some insights.

boomam said:
Thanks for your continued input btw, much appreciated.

You're welcome

boomam · Apr 28, 2023

Chris said:
Note that only chunks older than a time barrier will actually get deleted after being marked as unused, form the docs:

"Phase two: Sweep The task iterates over all chunks, checks their file access time, and if it is older than the cutoff time (i.e., the time when GC started, plus some headroom for safety and Linux file system behavior), the task knows that the chunk was neither referred to in any backup index nor part of any currently running backup that has no index to scan for. As such, the chunk can be safely deleted."

So maybe their atime was not yet old enough for the chunks to be cleaned up. Checking the storage usage might give some insights.

I've not noticed the storage increasing by much over the existing VM/LXC data that is there.
I did a quick test on the client, tar zcvf 00_TEST_data1.tar.gz ./data1 and interestingly, the file size of the compressed file was actually in the 3.6Gb range, so inline with PBS, BUT, took a good few minutes, compared to <2mins with the PBS client. I'm guessing that the difference there is perhaps the compression type.
I'll do some more testing on the other data, compare the actual sizes, with local compression and then what's in the PBS logs themselves, but its starting to look more positive...

boomam · Apr 28, 2023

ok.
Further testing - most of the data i'm using for testing is within 100Mb of what the backup client is reporting as transferred.
Which is a good sign.

I'm still at a loss on the re-used aspect, but for the moment im gonna guess its as you mentioned (@Chris) and perhaps there are some parts that have similar chunks to other pre-existing data.
I'll keep an eye on it for a few days, see how the data sizes change/etc. and report back accordingly.

Chris · Apr 29, 2023

boomam said:
ok.
Further testing - most of the data i'm using for testing is within 100Mb of what the backup client is reporting as transferred.
Which is a good sign.

I'm still at a loss on the re-used aspect, but for the moment im gonna guess its as you mentioned (@Chris) and perhaps there are some parts that have similar chunks to other pre-existing data.
I'll keep an eye on it for a few days, see how the data sizes change/etc. and report back accordingly.

You can also do a restore test and compare backuped and restored data with rsync.

boomam · May 3, 2023

Reporting back - it appears to be behaving.

Search

Search

PBS Backup Client - General syntax/usage issues

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

boomam

Member

boomam

Member

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

boomam

Member

Chris

Proxmox Staff Member

boomam

Member

We value your privacy