I'm in the middle of re-evaluating my setup for backups that I have right now and I'm likely going to put PBS in place to replace rsync, manual backups, and borg/borgmatic to get some nicer support and handling of most things. I've been using Bacula to backup all of that data, plus some bare files that I just don't have the storage to duplicate and sending them direct to tape for cold storage.
I saw that PBS supports sending from a dataset to tapes but have some questions about what I've been reading and want to confirm some assumptions and see if maybe there's something I'm missing.
1. PBS can only send from a dataset pool to tape, not to tape directly. For most of my things this'll be fine.
2. PBS doesn't directly make it easy to recycle old tapes that have a mix of retention policies on the jobs, i.e. there's no automated way to go: this tape has had > 50% of the chunks on it expire, let's write a new tape with the unexpired data so that we can get this whole tape back to use.
3. There's no kind of temporary cache/spooling support like I had in bacula, to collect the data and then shove it out into the tape drive at full speed. I think this might be less of an issue with the way PBS works compared to bacula since you sort of get this from the datasets already but essentially I can't use my 15TB ssd to write out a whole tape to dump it in a 9 hour write session at 300+MB/s without stalling.
Main questions I've got:
1. For the stuff that I just don't have the physical live storage available to duplicate on (basically everything I've ever done digitally for the last 25+ years) is there a way to say temporarily store it in small chunks on PBS and then put it to tape and expire it from the live datasets but keep track of the data on the tapes?
2. I see that there's an ability to sync data between datasets, but I couldn't figure out if there's an automated way to have only recent stuff on say the SSD pool and essentially garbage collect it as needed once it's on the larger HDD pool.
I'm completely fine building out some scripts myself to manage some of this but I couldn't figure out from the docs yet how possible some of this is or if i've missed anything about my assumptions.
Is anyone with experience able to give any advice about any of that?
Ideally I *would* like to have a separate live storage setup for those other files and such but I don't have the physical space, power available, not to mention cost of the storage itself, to build another machine to hold another 200+TB like that, hence why I went to cold storage with tapes in the first place for it.
I saw that PBS supports sending from a dataset to tapes but have some questions about what I've been reading and want to confirm some assumptions and see if maybe there's something I'm missing.
1. PBS can only send from a dataset pool to tape, not to tape directly. For most of my things this'll be fine.
2. PBS doesn't directly make it easy to recycle old tapes that have a mix of retention policies on the jobs, i.e. there's no automated way to go: this tape has had > 50% of the chunks on it expire, let's write a new tape with the unexpired data so that we can get this whole tape back to use.
3. There's no kind of temporary cache/spooling support like I had in bacula, to collect the data and then shove it out into the tape drive at full speed. I think this might be less of an issue with the way PBS works compared to bacula since you sort of get this from the datasets already but essentially I can't use my 15TB ssd to write out a whole tape to dump it in a 9 hour write session at 300+MB/s without stalling.
Main questions I've got:
1. For the stuff that I just don't have the physical live storage available to duplicate on (basically everything I've ever done digitally for the last 25+ years) is there a way to say temporarily store it in small chunks on PBS and then put it to tape and expire it from the live datasets but keep track of the data on the tapes?
2. I see that there's an ability to sync data between datasets, but I couldn't figure out if there's an automated way to have only recent stuff on say the SSD pool and essentially garbage collect it as needed once it's on the larger HDD pool.
I'm completely fine building out some scripts myself to manage some of this but I couldn't figure out from the docs yet how possible some of this is or if i've missed anything about my assumptions.
Is anyone with experience able to give any advice about any of that?
Ideally I *would* like to have a separate live storage setup for those other files and such but I don't have the physical space, power available, not to mention cost of the storage itself, to build another machine to hold another 200+TB like that, hence why I went to cold storage with tapes in the first place for it.