[SOLVED] PBS on PVE with two external HDDs as zfs mirror?

danman

Active Member
Jun 5, 2021
35
1
28
40
Hi

I would like to change a bit of my current backup procedure.
I have two identical external hard drives, 8TB each, connected via USB 3. I have my data stored at home and travel from time to time somewhere for several weeks and I want to keep an encrypted backup with me. Both hard drives actually have the same backup running separately.
I can't follow the recommended way to install PBS on a different hardware. Maybe I can buy 2 SSDs for the backup, but for now I would like to try something else first if it makes sense.

The main problem I am facing at the moment is the garbage collector. It takes days and I can't keep up with my daily backups because the storage reached the limits.

I am currently running PBS in a VM.
I have also installed PBS on the same server as PVE (https://pbs.proxmox.com/docs/installation.html#install-proxmox-backup-server-on-proxmox-ve). Maybe this will increase the speed a little (?).
Both HDDs encrypted via LUKS.

Before I go any further, I would like to ask you what are the possibilities to speed up the garbage collector with the current configuration. Is it even possible or should I "just" buy 2 SSDs?

Would a zfs mirror makes sense for both disks via usb3? And can I easily disconnect and reconnect one hard drive this way? Or should I synchronize both hard disks somehow differently?

Thanks
Dan
 
PBS need IOPS performance as everything will be stored as small (max 4MB each...here its 1.7MB average) chunk files. Lets say you got 8TB of data with a average chunk size of 2MB. That would mean a GC task needs to read+write the metadata of 4 million files. HDDs are terrible at random IO and might maybe handle 100 operations per second. So if you want to read and write 4 million files metadata that means 8 million IO. With just 100 IOPS thats 80.000 seconds = 22 hours. Only way to speed up the GC is to store your metadata on a storage that got a good IOPS performance like a SSD. If you don't got the money to buy two 8TB SSDs (QLC SSDs are bad...and its not recommended to use consumer SSD...so we are talking about something like 2000€) you could try a ZFS pool of 2x 8TB HDDs in in mirror for your data + 2x 64GB SSDs in a mirror as special devices to store the metadata. That way GC should be done in a few minutes instead of a couple of hours as all the metadata is stored on the SSDs with the great IOPS performance while the data is stored on those slow HDDs.
this should fix your problem with the GC but verify tasks would still take an eternity. To fix that replacing the HDDs with SSDs is the only option.
Also keep in mind that a ZFS pool should always have 20% of free space. So of a 8TB mirror only 6.4TB should be used.
And yes, USB will work but isn't that reliable.
 
Thanks for the very detailed explanation.

At the moment I am running some tests with PBS installed directly on PVE. It seems to be a bit faster, but the backup process is currently not fully activated.

I have an SSD with 1TB lying around. Does each HDD need its own cache drive or would one be enough?

And can I easily disconnect and reconnect one hard drive this way? Or should I synchronize both hard disks somehow differently?
ZFS is actually not possible, it already reaches 6TB, but probably more like 6.5TB after the next backups. But would it even be possible to disconnect one drive for the trip and keep the other running?

Or is there another way to sync both instead of doing 2 backups on the same day at different times. To minimize the reading of the VMs.
 
Thanks for the very detailed explanation.

At the moment I am running some tests with PBS installed directly on PVE. It seems to be a bit faster, but the backup process is currently not fully activated.

I have an SSD with 1TB lying around. Does each HDD need its own cache drive or would one be enough?
I talked about two SSDs as "special device" in a mirror which is not caching. If only use one special device SSD and loose all data on all HDDs is lost. Thats why you should mirror it. If you only got one SSD and dont want to buy a second one you could try a single SSD as a L2ARC with ZFS option "secondarycache=metadata" for the ZFS pool. Such a L2ARC SSD is a read cache and can be lost without a problem. But with a special device your wirtes would also be faster, with a L2ARC not.
ZFS is actually not possible, it already reaches 6TB, but probably more like 6.5TB after the next backups. But would it even be possible to disconnect one drive for the trip and keep the other running?
Without ZFS there is no special device or L2ARC for speeding up those HDDs with SSDs.
Or is there another way to sync both instead of doing 2 backups on the same day at different times. To minimize the reading of the VMs.
You can create two local datastores. One datastore1 on HDD1 and one datastore2 on HDD2. You then only backup to datastore1 and create a local sync job so that datastore2 pulls all backup snapshots once per day from datastore1. That way you get a copy of datastore1 on datastore2 without PVE backing up everything twice (which is alot of work as compression/encryption/hashing is done client side).
 
Hey

I finally have time to have a look again for that issue. And again, thanks for your help I really appreciate it!

At the moment I am running some tests with PBS installed directly on PVE
This actually works now since 1 month or so. I don't have any problems anymore. And it seems fast enough for now. But only for one drive. I randomly run a backup for the second drive because I couldn't create your suggestion:
create a local sync job so that datastore2 pulls all backup snapshots once per day from datastore1
I tried that but I can't create a sync job. There is nothing under "Source Remote". You probably meant another way? Not over pbs?
 
Jup, like _gabriel said, you need to define your remote first. And that remote should point to the PBS server itself. So for example the 127.0.0.1 loopback address. That way you can do local syncs between two local datastores.

Its really not that intuitive. Would be way more user friendly if there would be a default "local" remote existent when installing a PBS.
 
  • Like
Reactions: _gabriel
Oh yeah, I would never find this solution! Thank you both and thank you again for the detailed answers before @Dunuin

This seems to be sorted. Sync should work. If not, I'll be here again soon ;)