Tapespeed

FelixJ

Well-Known Member
Mar 1, 2019
66
3
48
44
I'm coming from vSphere + Veeam B&R.
PVE is working like a charm as does BPS.
So far I can see 2 general disadvantages which I want to address in the hope, they can be fixed in near future:

  1. as I migrated the vSphere 1:1 to PVE, and use the dirty-bitmap feature +dedupe of PBS, I would have expected a total disk-size for the backup close to what I had in Veeam B&R. There, I kept 30 days-snapshots, and it took 2.4TB of disk space. Now it takes currently 3.8TB (and not all VMs are migrated for 30 days now, so it will grow a little more as the snapshots are adding in). Are there any parameter I can tune to reduce the consumed space?
  2. How fast will the tape-writing be? As the tape-backup at present is only considered as tech preview, I still use a self-written srkipt, which invokes tar to write the everything underneath my backup-repository path to tape, which takes 30 and more hours, because reading lots of those tiny chunks-files from the fs takes so much time. Veeam B&R creates a big archive-type container file, containing all snapshots from one backup-job. I did some read testing to compare the speed of .chunks and my VBK-Files. (prior to each test I cleared the caches (echo 3 > /proc/sys/vm/drop_caches) to ensure accurate measurement:
    1. Tarring a 1.1 TB VBK file speeds up around 280MB/s (tar status=progress) while
    2. tarring the .chunks folder runs around 65 MB/s.
So, not only, my backups consumes 1/3 more space on the disk, which is generally not my problem, I have plenty of room;-), but due to the increased size as well as the slow read-speed due to those small chunks, my backups takes now 30 hours (without validating the content (tar -tvf)) instead of average of 14 hours (incl. validation).
Can those problems be addressed in the near future?
regards,
Felix
 
Hi,

as I migrated the vSphere 1:1 to PVE, and use the dirty-bitmap feature +dedupe of PBS, I would have expected a total disk-size for the backup close to what I had in Veeam B&R. There, I kept 30 days-snapshots, and it took 2.4TB of disk space. Now it takes currently 3.8TB (and not all VMs are migrated for 30 days now, so it will grow a little more as the snapshots are adding in). Are there any parameter I can tune to reduce the consumed space?
There's no "save 30% space" knob, if there was we'd turn it on by default. Compression and level of the actual data could explain some of that difference, that is not exposed.

How fast will the tape-writing be?

What LTO version do you use?

As the tape-backup at present is only considered as tech preview, I still use a self-written srkipt, which invokes tar to write the everything underneath my backup-repository path to tape
That will normally be always, at least slightly, slower than what the integrated tape solution can do.

Depending on your backup strategy you can even optimize this further with PBS. One could, for example, switch a tape on the PBS tape can specify that they only want to switch the tape every week, so only newer backup indexes and new referenced chunks will need to be actually written out after the first job went through. See the docs for some more details:
https://pbs.proxmox.com/docs/tape-backup.html#media-pools

Can those problems be addressed in the near future?

Which are the problems now exactly?
We cannot "fix" how tar works (albeit, you could try telling it to iterate over files sorted by inode), FWICT you did not actually try PBS Tape Backup yet, and we will naturally continue to evaluate performance and space optimizations.
 
Hi Thomas!

Okay, what are the problems:
First, I think, that swapping a well and fast working HDD RAID with SSDs is not a solution but a workaround.
Secondly I think, to mitigate the seek-operations while I/O, one could simply create a file-system in a top-level container what ever sort of, linux offers a variety of them, which, to make it easier and keep the current software-design, is to be mounted into the datastore directory for I/O operations. For Tapebackups the container must then be unmounted, or at lease remounted read-only. This way, one could tar it away without considering seek times caused by the underlying hardware.

Regarding the consumed disk-space: my question is simple: why, if I have 1TB of production VM-Disks in my vSphere, which is in total, after migration also 1TB of production Data in PVE: Why is the space, required for 30-days snapshot-backup one time 2.4 TB, and on the other hand 3.8TB. This makes only little sense to me, except, as if the dirty-bitmap-function, the deduplication or compression algorithms are not (yet) working as efficient as hoped.

Please note, this is not criticism, at the contrary, I try to improve the product by raising questions and bringing in my experiences and ideas.

hope, you understand me well,
Felix
 
First, I think, that swapping a well and fast working HDD RAID with SSDs is not a solution but a workaround.

That's more something for new setups. See here for some price calculations and rationale why a SSD only setup makes sense nowadays, especially in the enterprise area and at least for up to 30 - 40 TIB of storage needs (more works too, but may need other considerations, IMO 100+ TiB SSDs are closer than 100+ TIB HDDs):
https://forum.proxmox.com/threads/garbage-collector-too-slow.86726/#post-381027

SSDs are not a workaround, they provide a fundamentally better experience due to random IOPS having a constant cost, mostly independent in which order, lower power usage and higher reliability (no moving parts is always better).

Secondly I think, to mitigate the seek-operations while I/O, one could simply create a file-system in a top-level container what ever sort of, linux offers a variety of them, which, to make it easier and keep the current software-design, is to be mounted into the datastore directory for I/O operations. For Tapebackups the container must then be unmounted, or at lease remounted read-only. This way, one could tar it away without considering seek times caused by the underlying hardware.
The .chunk store already is on a file system, so I don't understand how that design could magically do away the underlying physical limitations of spinners, which with 10 - 20 ms seek time never will really improve.
Sound rather like that would add another layer of indirection which is further away from the actual data layout on disk, and thus more random IO is done, which is what "breaks" speed on spinners. If one would need to do away layers, creating a specially CAS FS directly in the kernel layer, but in the end even there you'll have those problems (and possible much more, kernel/fs programming isn't exactly a piece of cake, and even experienced FS shops got it wrong a lot, soo...)

Regarding the consumed disk-space: my question is simple: why, if I have 1TB of production VM-Disks in my vSphere, which is in total, after migration also 1TB of production Data in PVE: Why is the space, required for 30-days snapshot-backup one time 2.4 TB, and on the other hand 3.8TB. This makes only little sense to me, except, as if the dirty-bitmap-function, the deduplication or compression algorithms are not (yet) working as efficient as hoped.

I hope for more efficiency too, and as said we're permanently evaluating that point.
Without checking the actual setup, both VM and Veam wise, it's hard to determine a reason for this specific issue, so I cannot give you the specific answer you hope for here, I'm afraid.
 
Regarding the consumed disk-space: my question is simple: why, if I have 1TB of production VM-Disks in my vSphere, which is in total, after migration also 1TB of production Data in PVE: Why is the space, required for 30-days snapshot-backup one time 2.4 TB, and on the other hand 3.8TB. This makes only little sense to me, except, as if the dirty-bitmap-function, the deduplication or compression algorithms are not (yet) working as efficient as hoped.
Do you test with the most recent version?
 
That's more something for new setups. See here for some price calculations and rationale why a SSD only setup makes sense nowadays, especially in the enterprise area and at least for up to 30 - 40 TIB of storage needs (more works too, but may need other considerations, IMO 100+ TiB SSDs are closer than 100+ TIB HDDs):
https://forum.proxmox.com/threads/garbage-collector-too-slow.86726/#post-381027

SSDs are not a workaround, they provide a fundamentally better experience due to random IOPS having a constant cost, mostly independent in which order, lower power usage and higher reliability (no moving parts is always better).
Ok. I take that as it is - I'm used to work with what I have, and thats a HDD-RAID, that will read with DD up to 300MB/s which is, from my point of view, good enough at that point.
Though I can considere about SSDs.
The .chunk store already is on a file system, so I don't understand how that design could magically do away the underlying physical limitations of spinners, which with 10 - 20 ms seek time never will really improve.
Sound rather like that would add another layer of indirection which is further away from the actual data layout on disk, and thus more random IO is done, which is what "breaks" speed on spinners. If one would need to do away layers, creating a specially CAS FS directly in the kernel layer, but in the end even there you'll have those problems (and possible much more, kernel/fs programming isn't exactly a piece of cake, and even experienced FS shops got it wrong a lot, soo...)
My thinking is like this: If I can write a big (1TB) file from the same filesystem (this might be an info, which I have not given you yet) with 280MB/s to the tape which hosts the .chunks folder, the idea would be, that if the .chunks folder would be in "reality" a mounted filesystem-container, I could simply get that container and write it with 280MB/s to the tape.
There might be a slight disadvantage when the actual vm backup runs, there I give you credit for, that could have negative impact on the performance, but that also depends what the file system container would be. For example, maybe a dedicated LVM would be sufficient, then one could simply dd the lvm to the tape - of course, the recovery process is less "nice" then having real files on the tape - on the other side, a tape is per-design, due to it's streaming nature a slow media and it's primarily purpose is having cold-backups.


I hope for more efficiency too, and as said we're permanently evaluating that point.
Without checking the actual setup, both VM and Veam wise, it's hard to determine a reason for this specific issue, so I cannot give you the specific answer you hope for here, I'm afraid.
How ever, I think, there is some space for improvement yet!
Maybe you consider my thoughts in one of your development meetings...
 
the idea would be, that if the .chunks folder would be in "reality" a mounted filesystem-container, I could simply get that container and write it with 280MB/s to the tape.
Our tape backup is fully deduplicated. So we normally only backup a few chunks (not the whole chunk store) to a tape. Also, we
want to be able to select single VMs for backup...
How ever, I think, there is some space for improvement yet!
Maybe you consider my thoughts in one of your development meetings...
What chunk size do you use for veeam backup? Please can you try using 4MB and compare the space usage?
 
Where do you want me to change the chunk-size? Veeam or in PBS?
I have no idea, where to change it in either product.
regards,
Felix
 
Hi Thomas,

There's no "save 30% space" knob, if there was we'd turn it on by default. Compression and level of the actual data could explain some of that difference, that is not exposed.

I have a 1-Bit-Compression Algorithm in place here which compresses ANY kind of data into ONE Bit.

Works like a Charm. Only little tiny thing is: I haven't figured out a way to actually restore the data yet :-)

SCNR

Tobias
 
Hi Thomas,



I have a 1-Bit-Compression Algorithm in place here which compresses ANY kind of data into ONE Bit.

Works like a Charm. Only little tiny thing is: I haven't figured out a way to actually restore the data yet :)

SCNR

Tobias
this is rather cynic, isn't it? And will not help to address the actual problem nor improve the product, but well... sometimes cynicism is the only way to cope on things, you cannot change anyway!
 
I think it's a joke, and surely not meant in any harmful way - at least I took it that way. :)

Ok. I take that as it is - I'm used to work with what I have, and thats a HDD-RAID, that will read with DD up to 300MB/s which is, from my point of view, good enough at that point.
Though I can considere about SSDs.

Sure, I can totally understand that using existing HW is cheaper and also less wasteful to "throw just out", so this was mostly for meant for any new setups. And yeah, bandwidth wise those spinenrs ain't that bad, but random IO, which is what seek time limits so much, is just way worse than any flash storage can be. That's why I'd prefer QLC based SSDs over 10k spinners even if the latter would have slightly better bandwidth, as longer running storage with higher fragmentation or such things like content addressable storage make sequential IO almost impossible after a while, so those systems can profit a lot from fast seek time.

My thinking is like this: If I can write a big (1TB) file from the same filesystem (this might be an info, which I have not given you yet) with 280MB/s to the tape which hosts the .chunks folder, the idea would be, that if the .chunks folder would be in "reality" a mounted filesystem-container, I could simply get that container and write it with 280MB/s to the tape.
There might be a slight disadvantage when the actual vm backup runs, there I give you credit for, that could have negative impact on the performance, but that also depends what the file system container would be. For example, maybe a dedicated LVM would be sufficient, then one could simply dd the lvm to the tape - of course, the recovery process is less "nice" then having real files on the tape - on the other side, a tape is per-design, due to it's streaming nature a slow media and it's primarily purpose is having cold-backups.

Hmm, OK then I got you slightly wrong - sorry for that. But, the issue with your approach is that it basically undo deduplication efforts, or at least make it necessary to read all and diff between those images. Things like the aforementioned smarter media set tape allocation scheduling cannot be done any more. So it's really a trade-off, and we would like to keep the deduplication semantics for the tape too, it makes the design more consistent and features possible which otherwise would not be doable (at least not without doing more IO)

How ever, I think, there is some space for improvement yet!
Maybe you consider my thoughts in one of your development meetings...
Sure do, we actively discuss those things, not everything reaches the forum as is but we surely don't just brush off any input we get. We have one idea which is independent of the layout and which I suggested to you for the tar command, reading files sorted by inode. That gives better performance as inodes sorting and physical data location on spinning disks correlate.
 
  • Like
Reactions: iprigger
I think it's a joke, and surely not meant in any harmful way - at least I took it that way. :)
It's all good! I understood! I'm working long enough in that business!
Sure, I can totally understand that using existing HW is cheaper and also less wasteful to "throw just out", so this was mostly for meant for any new setups. And yeah, bandwidth wise those spinenrs ain't that bad, but random IO, which is what seek time limits so much, is just way worse than any flash storage can be. That's why I'd prefer QLC based SSDs over 10k spinners even if the latter would have slightly better bandwidth, as longer running storage with higher fragmentation or such things like content addressable storage make sequential IO almost impossible after a while, so those systems can profit a lot from fast seek time.
I have some super-duper Samsung Enterprise-Class 4TB SSDs as spares for my ceph around, which I want to "abuse" to test the difference, but that will take some time! As I migrated from vsphere+ veeam, I thought about a lot of issues that might happen, but I never thought, that the tape-runtime could be more then 24hrs... which breaks my backup-concept - for right now anyway - the neck.
So I am also busy finalizing the migration!

Then I saw, that build-in tape-backup might be possible anyway in near future - unfortunately, as you can read from my other post, my tape-drive is not properly recognized - a patch is pending and I am waiting for it to be in the stable branch, so I can use the deduped chunk-store, and hopefully increase the backup-time!
For now I cannot backup the chunk-store, it's nearly 4TB big and growing 14 more days, 'till all vm's have 31 restore points on disk. The further growth through user-datacreation is minimal.

An other question related to that would be: Will it be possible, with pbs tape-backup to write content outside of the chunks-store to the tape - like other local or remotely mounted filesystems?

Hmm, OK then I got you slightly wrong - sorry for that. But, the issue with your approach is that it basically undo deduplication efforts, or at least make it necessary to read all and diff between those images. Things like the aforementioned smarter media set tape allocation scheduling cannot be done any more. So it's really a trade-off, and we would like to keep the deduplication semantics for the tape too, it makes the design more consistent and features possible which otherwise would not be doable (at least not without doing more IO)
Yea, it would break the pbs concept totally, however, it must not be either one or the other, it could be both. The top-level container would simply ensure, a fast-read on "spinners" outside pbs - for example, if you want to transfer the whole chunks-store to an other set of disks would be more efficient.
Sure do, we actively discuss those things, not everything reaches the forum as is but we surely don't just brush off any input we get. We have one idea which is independent of the layout and which I suggested to you for the tar command, reading files sorted by inode. That gives better performance as inodes sorting and physical data location on spinning disks correlate.
I tried that. It gives a slightly better performance. Without sorting I got a read-performance about 65MB/s.
With any of those methods
Bash:
ls -U -i /var/lib/vz/backup/pbs/.chunks/| sort -k1,1 -n | cut -d' ' -f2- > ~/clist; tar -cf /dev/st0 -T ~/clist
or
Bash:
tar --sort=inode -cf /dev/st0  /var/lib/vz/backup/pbs/.chunks/
I get around 80MB/s which is definitely an improvement but far from what the tape-drive could write.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!