25-30 TB, HDD too slow

einsibjani · Jan 20, 2024

I have two servers running PBS. Three clusters backup to the first PBS server and the other PBS server synchs backups from the first one.
PVE does hourly backups of 10-20 VM's, staggered between clusters so they're not backing up at the same time.

Both servers have 6x10 TB HDD in ZFS "RAID10" and zpool special dev on mirrored NVMe disks. Prune is run once a day, garbage collect and verify once a week.
After adding the special dev, garbage collection takes a managable 1-2 hours to run, but obviously verify still has to read every bit and takes 24+ hours to run.

Listing backups in PVE also takes a long time.

I know Proxmox strongly suggests using SSD's for backup storage, but 30 TB in mirrored SSD storage costs $$$.
I've contemplated buying a tape drive, but I'm not sure it would help alot, anybody have experience with PBS and tape? Do you use SSD for backups, then long-term storage on tape? How much SSD space would be needed?

Anybody else using all-SSD for this much space? What kind of disks?

VictorSTS · Jan 20, 2024

To backup to tape, you must backup to disk first, then to tape. You can't backup directly to tape from PVE, so it won't help mitigate any of the symptoms you describe, which btw are more or less to be expected.

Listing backups in PVE also takes a long time.

If you are using a fast enough special device this shouldn't happen. Maybe you added the special device after you created the datastore? Maybe even after having some backups already? In that case, the whole directory tree of the datastore is stored in your HDD, defeating part of it's advantages for PBS.

I would also check arc_memory, maybe you can give more ram to it so it can keep more metadata cached.

einsibjani · Jan 20, 2024

VictorSTS said:
To backup to tape, you must backup to disk first, then to tape. You can't backup directly to tape from PVE, so it won't help mitigate any of the symptoms you describe, which btw are more or less to be expected.

I was thinking I could get away with smaller SSD storage for latest backups, and tape for longer term. With de-duplication and compression, the storage needed on SSD might not be that much less that having incremental snapshots all on SSD.

VictorSTS said:
If you are using a fast enough special device this shouldn't happen. Maybe you added the special device after you created the datastore? Maybe even after having some backups already? In that case, the whole directory tree of the datastore is stored in your HDD, defeating part of it's advantages for PBS.

You're right, I added the special dev to an already existing datastore. I monitored the space used on the the special dev thinking with enough time, all of the metadata would reach the special device eventually. I guess I should create a new datastore to see the real benefits of the special dev.

VictorSTS said:
I would also check arc_memory, maybe you can give more ram to it so it can keep more metadata cached.

I will.

riedel · Jan 21, 2024

I wonder if there are any experiences with tiering and storing the index parts on separate (SSD backed) zfs pool.

PBS is working really great, however, we have huge lags in the PVE GUI, which is reporting timeouts particularly when trying to list the backups (restoring or backuping itself is not a problem)

Also just slowly writing through old chunks to HDDs would lead to a much lower latency and we have enough time during the day to do the offloading.

einsibjani · Jan 26, 2024

riedel said:
I wonder if there are any experiences with tiering and storing the index parts on separate (SSD backed) zfs pool.

PBS is working really great, however, we have huge lags in the PVE GUI, which is reporting timeouts particularly when trying to list the backups (restoring or backuping itself is not a problem)

Also just slowly writing through old chunks to HDDs would lead to a much lower latency and we have enough time during the day to do the offloading.

Yes, that would be very interesting. We could do something today with two datastores, one on SSD's and one on HDD's, and a sync job. One downside is that to restore from tier-2 datastore, you would have to add both datastores to PVE. Not a big problem, we already do something similar where we have two servers, backups go to server1, server2 syncs backups from server1, and both server1 and server2 are added as datastores in PVE so we can choose to restore from server2.

But I'm not sure if the data saving will be great, since you would still store at least one snapshot for each image in tier-1, and because of de-duplication in PBS, storing more than one snapshot is pretty cheap. So if I'm storing 20 TB of backups now, with 24 hourly and 30 daily snapshots, it's not like I would drop down to 1 TB storage by only storing the latest snapshot.

RolandK · Jan 27, 2024

>Anybody else using all-SSD for this much space? What kind of disks?

we don't want ssd, we want faster/optimized verify and optimized access to datastore first

https://bugzilla.proxmox.com/show_bug.cgi?id=5035

https://bugzilla.proxmox.com/show_bug.cgi?id=3752

einsibjani · Jan 29, 2024

RolandK said:
>Anybody else using all-SSD for this much space? What kind of disks?

we don't want ssd, we want faster/optimized verify and optimized access to datastore first

https://bugzilla.proxmox.com/show_bug.cgi?id=5035

https://bugzilla.proxmox.com/show_bug.cgi?id=3752

Yes, that would solve most of our problems

I did re-create our datastores, this time adding the special vdev's when we create the datastores and first impressions are good. Listing backups is faster than before and no random sync errors like before. The first verify & garbage collect jobs haven't run yet, so I won't dance the happy dance just yet.

Search

Search

25-30 TB, HDD too slow

einsibjani

Member

VictorSTS

Famous Member

einsibjani

Member

riedel

New Member

einsibjani

Member

RolandK

Renowned Member

einsibjani

Member

We value your privacy