S3 endpoint uses local space

Cocolin

Member
Apr 4, 2023
10
0
6
Hi everyone,

Could you help me understand something on my Proxmox backup.

I updated it to 4.0 to connect to an S3 endpoint and I made my first backup on it, but I saw that the space used on my local storage has increased.

1762966580876.png

I can correctly see the backups on the bucket so I assume that I correctly set up my datastore.

Thanks for your help.
 
When you add a S3 datastore you have to define a cache directory and that has to be local. Tha's why the used space increased.
It is recommended to use a different partition or drive for the S3 cache otherwise your root disk could run out of space.

Code:
the datastore requires nevertheless a local persisant cache, used to increase performance and reduce the number of requests to the backend. For this, a local filesystem path has to be provided during datastore creation, just like for regular datastore setup. However, unlike for regular datastores the size of the local cache can be limited, 64 GiB to 128 GiB are recommended given that cached datastore contents include also data chunks. Best is to use a dedicated disk, partition or ZFS dataset with quota as local cache.

https://pbs.proxmox.com/docs/storage.html
 
  • Like
Reactions: Johannes S
Yes, it will use as much space as possible - that's why it is recommended to use a partition / disk for the s3 cache.
If your S3 storage is local the cache does not have to be big, if you use a hosted one it should be bigger because you won't have to download as many data chunks for verify (egress costs / transaction costs ...) but that depends on your provider.
 
My S3 storage (Backblaze) is configured on the PBS server I normally backup to, I want to backup a VM thats 800GB and another thats 1.1TB - I definitely dont have that total available on the PBS server.
If I did one at a time and deleted the cache afterwards would that work?
This is due to a one off situation where I must have an offsite copy of these 2 VM's.
 
As far as i understand it the cache is not that important for the upload of the backup but for verifying backups afterwards.
When you start a verify run PBS downloads *all* chunks referenced by the backup you are verifiying from S3 and verifys the checkums of the chunks. But that could get expensive because providers charge egress traffic and api calls - but if a chunk is already in the local cache it is not downloaded -> less cost.
And obviously your internet connection should be fast otherwise that would take very long.

Technically you could have a local 128GB cache partition mounted to your PBS and backup those 2 VM's to S3.
But be aware that S3 is not intended as your single primary backup storage. And it is in tech preview anyway so i would not recommend to rely on the S3 datastore feature (yet).

Alternatively you could set up another PBS somewhere or a VM in the cloud and pull all backups from your primary PBS. But that does not solve your storage space issue.

You may not need that much space for the local PBS store if your 2 VM's can be deduplicated well you might be able to get away with less PBS backup storage but that's speculation from my side.

You could also explore PBS removable datastore -> backup to a usb hard disk.
Or you could mount a usb hard drive to your PVE and do a vzdump backup.
But that depends on how often you want / have to backup those 2 VM's ...
 
  • Like
Reactions: UdoB
When you add a S3 datastore you have to define a cache directory and that has to be local. Tha's why the used space increased.
It is recommended to use a different partition or drive for the S3 cache otherwise your root disk could run out of space.

Code:
the datastore requires nevertheless a local persisant cache, used to increase performance and reduce the number of requests to the backend. For this, a local filesystem path has to be provided during datastore creation, just like for regular datastore setup. However, unlike for regular datastores the size of the local cache can be limited, 64 GiB to 128 GiB are recommended given that cached datastore contents include also data chunks. Best is to use a dedicated disk, partition or ZFS dataset with quota as local cache.

https://pbs.proxmox.com/docs/storage.html
Hi,

Thanks for your answer!

So I made a new directory on a new disk to use it as cache, but now when I make a backup its running indefinitely after fishinging first vm's backup , and I can't access webui anymore, I have to reboot it... Do I make a bad configration ?

1763044371986.png
1763044392158.png
1763044437986.png

Thanks again for help
 
Yes, it will use as much space as possible - that's why it is recommended to use a partition / disk for the s3 cache.
If your S3 storage is local the cache does not have to be big, if you use a hosted one it should be bigger because you won't have to download as many data chunks for verify (egress costs / transaction costs ...) but that depends on your provider.

As far as i understand it the cache is not that important for the upload of the backup but for verifying backups afterwards.
When you start a verify run PBS downloads *all* chunks referenced by the backup you are verifiying from S3 and verifys the checkums of the chunks. But that could get expensive because providers charge egress traffic and api calls - but if a chunk is already in the local cache it is not downloaded -> less cost.
And obviously your internet connection should be fast otherwise that would take very long.

These statements stand to be corrected: actually the verification is currently the operation which always bypasses the cache and fetches the contents for verification. All the other operations are trying to use cached metadata and chunks whenever possible, especially avoiding re-uploads and re-downloads of chunk and metadata files. There are ideas on how to make verification for the s3 backend more lightweight, but that is not implemented yet at the time of writing.
 
  • Like
Reactions: MarkusKo
Hi,

Thanks for your answer!

So I made a new directory on a new disk to use it as cache, but now when I make a backup its running indefinitely after fishinging first vm's backup , and I can't access webui anymore, I have to reboot it... Do I make a bad configration ?

View attachment 92730
View attachment 92731
View attachment 92732

Thanks again for help
Make sure you run at least version 4.0.18-1 of proxmox-backup-server, there were some bugfixes with respect to possible deadlocks.