really slow restore from HDD pool

carl0s

Active Member
Jul 19, 2017
9
1
43
43
I have a RAIDZ on PBS with 4x 16Tb SATA HDD.

I am watching a KVM restore on a 10G LAN at 12MiB/s. Target pool is 5x 1.92Tb enterprise SAS 12g ZFS through HBA330 controller.

how can it be so bad? I understand it's many small files, but 12MiB/s ? There must be a way to improve this? I read that SSDs are recommended, but who would use wearout&high-cost SSDs for bulk backup storage?

I initiated the restore from PVE rather than from PBS if that matters. It is restoring to a different node (non-clustered)
 
Last edited:
That was a restore to an older node (48 x Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (2 Sockets), target datastore 5x 1.92TB RAIDZ SAS 12G ZFS HBA330 no controller raid.

I am now testing a restore to the original node which is a modern PowerEdge R7615 with the fastest single-core performance I could find at the time (32 x AMD EPYC 9174F 16-Core Processor (1 Socket), 4x Intel 3.84TB on latest PERC12 NVMe controller - H965i, controller RAID5, no ZFS, LVM Thin.).

It seems that the restore performance is double (half as bad) on the newer server, which makes me think this is not IO limited, and certainly not caused by the spinning HDDs on PBS.

It is still not good, but it is half as bad. Any ideas?

Attached are the identical restore jobs - one to older node, one to newer node. Both PVE 8.2 (8.2.7 and 8.2.4)

older node which is 48 x Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (2 Sockets), target datastore 5x 1.92TB RAIDZ SAS 12G
Older node manages 12 MiB restore (on latter part, second disk especially)
took 1hr
very poor performance for restoring 2nd disk


Newer node is still awfully slow, but half as slow as the other one. Both on the same 10G LAN.
I can see with btop on the pve node that newer one is chugging along at ~30MiB/s over the network while the older one was 12MiB/s.
The strange thing is, it's even really slow to make progress when there is nothing coming over the network and pbs box shows 100% idle in top (around ~36 - 39% of the restore of the second larger disk).

restore to newer node
start 23:35
done at 00:04
30 minutes.
 

Attachments

  • task-pve-SLOWER-NODE-qmrestore-2024-10-16T21_25_26Z.log
    20 KB · Views: 1
  • task-pve2-SLIGHTLY-LESS-SLOW-NODE-qmrestore-2024-10-16T22_35_03Z.log
    20.2 KB · Views: 1
Last edited:
Even for the newer node the data to restore lays on the RAIDZ HDD datastore right? Then my guess is, that it behaves like expected with the number of small files of PBS. I also recall that in a lot of discussions here RAIDZ was called out for being slow compared to a ZFS mirror setup.
If you can install some SSDs in your backup host it might be worth to setup ZFS with a special device for the metadata to speed up the operations.
But this needs serious consideration and planning since in case of a loss of the special device you can't access your data anymore.

I recall that in the german subforum @Falk R. and other consultants explained their setup for their customers PBS servers at Hetzner:
- Two relative small (and thus affordable) SSDs
- Two 10 TB HDDs

They would be configured like this:
- The HDDs are configured as ZFS mirrors für the datastore
- The SSD are split up in two partitions which are ZFS mirrors as well. One smaller for the PBS operating system and one larger one for usage as special device of the datastore.

A similiar setup should also work with your local PBS, the mirrors take care that one broken disk doesn't lead to a complete data loss.

If you unterstand German (or use Google Translate/Deepl) here is the thread: https://forum.proxmox.com/threads/ist-es-möglich-pbs-mit-aws-s3-blockspeicher-zu-betreiben.152790/

Maybe Falk or some other of the participants can explain it better than me.

One problem in your case could of course be the existing data store, which you may have to save somewhere else before setting up everything with a more performance-oriented approach.

Hope this helps, best regards, Johannes.
 
Last edited:
  • Like
Reactions: UdoB
That is correct in terms of where the data to restore lays.

Thank you for your pointers and links. I have to go out now but I will do some study over the following week while I am away on holiday.
 
  • Like
Reactions: Johannes S
I understand it's many small files, but 12MiB/s ?
If "atime" is active on the datastore then each read of a single chunk would require (at least) one additional write. All in all you may need several head movements, (~ 10 ms each) for each and every chunk. And with RaidZ you only have the IOPS of a single drive. Rotating rust is really slow in this usecase.

The hint regarding the "Special Device" from @Johannes S is really important and speeds up the thing drastically.
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!