Backup PBS data to cloud provider using rclone

eribob

New Member
May 11, 2025
4
1
3
I am using PBS to backup my VM:s and containers from two proxmox nodes and I am very happy with it (deduplication factor 33!).

I know it is not the best practice, but I have set up PBS as a VM on one of my PVE nodes and I am using a NFS share from my NAS as datastore.

The NAS is running TrueNAS scale and it has a cloud sync feature that uses rclone to backup data to the cloud. To get an offsite backup I set up a cloud sync to a storj bucket.

When reading up on best practices I came accross a few threads on this forum suggesting that my PBS datastore data may be corrupted if I back it up using this method?
For example: https://forum.proxmox.com/threads/datastore-synced-with-rclone-broken.154709/

But I did not quite understand why this could happen? Should the data not be an exact copy over on the storj bucket?
I have selected the "use snapshot" option in truenas which I think makes a snapshot of the data that it then clones to the cloud and that is removed after the backup is completed. I have also taken care to make sure that no backups to the PBS from my PVE nodes will be performed when the cloud sync starts.

Are there still obvious pitfalls with this method?
I know that a second PBS instance is the optimal way to backup the data, but I do not have access to that, and cannot afford a cloud VPS with >2TB of storage.

Thank you in advance for any assistance!
 
But I did not quite understand why this could happen?
Well..., I am not sure, but:

The chance to get an inconsistent .chunks-folder is not zero. You need to make sure that all of those 100000 to millions chunks are from the same moment in time. This also means that you need to pull back all of them to restore any small file from PBS.

For me this concept seems to be really fragile.

If you want to "rsync"/"copy" to an external destination (which is really a good idea!) use a temporary, local destination to create a classic "vzdump" backup.

Of course the much better way is to setup an isolated, external PBS and let that one pull from your local primary one.

Good luck :-)
 
  • Like
Reactions: Johannes S
Thank you for that answer!

If you want to "rsync"/"copy" to an external destination (which is really a good idea!) use a temporary, local destination to create a classic "vzdump" backup.

Do you mean that I should create another parallell backup job in PVE using the classic method and sync that data to storj using rclone?

Of course the much better way is to setup an isolated, external PBS and let that one pull from your local primary one.
Yes. More servers are always better :)
I need to find a nice person to host it...
If anyone can suggest a reasonably priced VPS provider that I could use for this, I am interested though... I have never used these providers so I do not know much about them to be honest.
 
Do you mean that I should create another parallell backup job in PVE using the classic method and sync that data to storj using rclone?

Yes. The downside is clear: it needs space.

At least temporarily and possibly just for one VM - if you are able to write a wrapper-script like "vzdump ... copy/sync ... delete".

Please note that I do not recommend that, I just wanted to mention an alternative approach as syncing ".chunks" look like a bad idea to me.
 
  • Like
Reactions: Johannes S
Using rclone is a bad idea, rclone is known to break backups stored in a PBS datastore.
https://forum.proxmox.com/threads/datastore-synced-with-rclone-broken.154709/
https://forum.proxmox.com/threads/pbs-appears-not-to-write-to-disk.157751/

The reason is that PBS splits the data in a lot of small files for his deduplication magic (the space savings are insane)., To ensure that they are complete and consistent PBS expect them to be synced in a certain manner (at least according to the explainations done by Proxmox stuff here in the forum and in the manual, up to now I' m neither motivated nor skilled enough to check the source code to determine how everything is working). Since the existing options were not sufficient for this suecase the Proxmox developers developed their own sync mechanism for syncing between different PBS instances.

Your best bet would be sycing to another PBS e.G. in your home network or a service like inett or tuxis managed PBS offerings.
 
To ensure that they are complete and consistent PBS expect them to be synced in a certain manner
Thank you for the reply. I have also seen the links you refer to. I would be very happy if a dev over here explained in which circumstances rsync could cause pbs data corruption.

I still cannot really understand why an exact replica of the data (which is what rclone is supposed to create?) would not work as a backup.

The reason I am asking is that it of course is expensive and more complicated to buy an additional server, find somewhere to put it and pay for electricity and internet connection.
 
Thank you for the reply. I have also seen the links you refer to. I would be very happy if a dev over here explained in which circumstances rsync could cause pbs data corruption.

I never bothered trying to actually understand the source code. For me reports like the one linked plus the posts by staff members that such phenomenas are the exact reason why they bothered to implementing their own sync mechanism are enough. I mean if it would have been possible to implement support for using rclone they would have native support for cloud storage. Now I might be naive, but If would have been a developer I would love to kill two birds (cloud backups and sync between PBS servers) with one stone :) So I guess they had their reasons to not go with that road.

To be honest: I'm quite conserative in that regard, I prefer a boring but known stable solution to hacks which might work but are known to be problematic when they might impact my backups. Other people (especially in a homelab environment) might be more adventurours and that's absolutely fine. To each it's own :)

I still cannot really understand why an exact replica of the data (which is what rclone is supposed to create?) would not work as a backup.

One reason I would expect (but as I said I never bothered trying to read the source code) is that during the rsync run, the PBS might continue to write to the datastore, thus you won't have guarantees for a consistent state of the datastore.

The reason I am asking is that it of course is expensive and more complicated to buy an additional server, find somewhere to put it and pay for electricity and internet connection.


Your budget might vary but I think that the prices for a cheap vserver like netcup or a PBS Cloud service like Inett are quite affordable (I pay around 13 Euro for my netcup vserver with 350 GB additional storage, Inett charges around 0,02 Euro per GB). If I wouldn't use my vserver for other stuff too I would propably switch to a Inett PBS storage since then I wouldn't have to deal with the maintenance of the vserver anymore.
For comparison: A hetzner storagebox with 5 TB storage space shouldn't be used as PBS datastore (the performance is just awful, anybody who does any tutors on it on youtube should be ashamed of theirselve) but will be fine for using the native backup feature of PVE (aka vzdumps vma archives) together with rclone and cost around 12 Euro. Now this is obviouvsly a lot cheaper but you won't get deduplication. automatic verify of your backups and the other PBS features that way. So in my book the comparison is like comparing apple to oranges. In the end I don't care whether I pay 12 Euro for 5 TB or for 350 GB as long as the storage is enough for my data. But as I said: For your usecase this might look different and that's allright.
Just to be sure: Are you in a (small) business environment or is this just your home network? Another cheap alternative might be to buy some external discs and use them as removable datastore. One of them you would store outside of your place (e.G. your office or a friend or family members place) and you would exchange them on a regular schedule (like weekly, monthly etc).
 
  • Like
Reactions: UdoB
Many thanks for an extensive answer!
I have around 1TB of data which would mean about 20EUR per month with inett then, and it will increase as my data grows.
This is all my own private data (home network). However, I do not want to be adventurous with backups, there are more fun things to play around with imo...
I will also consider buying/building another cheap server from spare parts to use for this job.

One reason I would expect (but as I said I never bothered trying to read the source code) is that during the rsync run, the PBS might continue to write to the datastore, thus you won't have guarantees for a consistent state of the datastore.
This is what I suspect too, that concurrent backups or possibly other jobs on the datastore (prune? garbage collection? verify?) when I am cloning to the cloud may cause problems. I tried to make sure this would not happen by timing the backups and other jobs, as well as by having my truenas server create a snapshot that it sends to the cloud. However it is likely that the PBS devs can not recommend this strategy as peoples hardware and backup sizes vary so there is a big risk that there will be collisions.

It would still be nice if a dev could confirm this theory =)
 
  • Like
Reactions: Johannes S