Slow restore speeds - Incorrect Configuration?

May 13, 2025
3
2
3
I am new to Proxmox and trying to configured a test setup on some servers we have spare to get familiar before we order 2 machines specific for it.

We have 2 Dell r740xd servers with
  • Xeon Gold 6240 CPU
  • 192gb RAM
  • 2 x 1GB SAS SSD in Raid 1 for the OS
  • 10 x Dell SAS 4TB drives in RAID 5
  • PERC H740P RAID Controller (RAID MODE)

    I have one machine configured as the VE and this is formatted as EXT4

    The other is configured as the Backup Server and it also formatted to EXT4

We have a 10gbe connection between the 2 servers and using iperf i can confirm this is working and transmitting at 10gbps

When we are backup up a 450gb VM it is taking around 45 minutes and the data transfer is not reaching anywhere near the speeds we would expect network wise and the write and read speeds on average around 150-200mbps.

Then when trying to restore it is taking around 1.5hrs. When the restore is processing i loaded atop and saw the pbs-restore is nearly maxing the CPU core (im assuming just a single core) and the LVM is at 100%

So i guess my questions are

What could be causing the bottleneck in this setup
  • The SAS spinning disks?
  • The way the partitions are formatted i.e EXT4 or ZFS?
  • Is the Virtual drives been configured in RAID by the raid controller an issue?
Is there a way of benchmarking the process to see what is causing the issue?

Any help or suggestions would be appreciated
 

Attachments

My guess is that your biggest bottleneck are your spinning discs and their RAID setup doesn't help either. The deduplication in PBS splits the data in a lot of small files (chunks typical sizes are around 2-4 MB ) which are then referenced in the snapshots. So if a new backup is done and most of it's data is already present in chunks the backup job doesn't need to reupload and store these data but just needs to add a new reference which is obviouvsly a lot faster than a restore where you would actually need to read all chunks which belong to the snapshot you want to restore. It's also the reason why PBS has it's phantastic space efficiency (deduplication ratio in your datastore information and the general output of a garbage collection job log gives a good idea, the gc log also contains the amount of chunks currently saved in your datastore) With other words: PBS really needs large IOPS (abillity to do a lot of paarallel reads and writes) on the storage which is the reason why the manual recommends fast local storage (aka enterprise SSDs): https://pbs.proxmox.com/docs/installation.html#recommended-server-system-requirements

Do you actually need to use the RAID mode of your HW RAID? A ZFS based setup would benefit you in multiple ways:

  • You could setup a pool with striped mirrors (aka RAID10) instead of your RAID5 setup. RAID10 should be way faster for the restore since the reads are distributed on both mirros which together build the striped mirror pool.
  • You can add some smaller (rule of thumb around 2% of the capacity of your HDD based pool) SSDs as a special device. Then every new files will be split: The raw data will still be stored on the HDDs but the metadata will be stored on the SSDs. So anything which needs to read the metadata (file access times, creation dates, permissions etc) can use the SSDs with their higher IPS. Especially Garbage collection jobs profit a lot from them, restore and verify not so much (since they need to actually read the bulk data on the HDD) but the effect is stil there. One caveat though: If your special device gets broken, all data is lost. For that reason the special devices redundancy shouldreflect the redundancy of your HDD pool (so if you have two HDDs, you will need two SSDs, for a RAID3 with three HDDs you should have three SSDs, for a RAID10 like pool four etc). See the manual (https://pbs.proxmox.com/docs/sysadmin.html#local-zfs-special-device ) and wiki ( https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_special_device ) for more information.
Maybe you can do a proof of conecpt with some discs connected not over the RAID controller? ZFS and hw raid don't play nice together so you can't use ZFS on the discs attached to your RAID controller unless you change it's mode to "IT-Mode" (or whatever the disabling of RAID is called on it).

Regarding benchmarking I remember threads where people used fio for benchmarking their storage (like here: https://forum.proxmox.com/threads/ssd-iops-benchmarking.123646/ ).
Not directly related to your usecase (since it's mostly on different ways to connect network storage to PBS) but maybe still of interest:
Somebody wrote also a datastore performance tester for PBS which tries to simulate the small file chunk mechanism of PBS, although it's mostly a comparision of different filesystems and ways to attach network storage:
Please note that there was quite a lively debate on the validity of the assumptions of it's developer between him and the Proxmox developers in https://forum.proxmox.com/threads/developer-question-about-chunk-folder.148167

But even then they agreed with his main result, that a network storage isn't good for performance (as said not much help for your usecase I fear).
 
  • Like
Reactions: dale1606 and UdoB
Thanks for that,

I have since my last post

  • Swapped the RAID controller with a HBA in IT mode (HBA330)
  • Changed to a zfs pool with striped mirrors (see below), seeing your comment reinforces it was a better choice for the storage.

1747380232829.png

The 2 VM's i am backing up to test are a Windows 10 VM (250gb) and also a Redhat Linux VM (452gb) that i imported from an ESXi host.

I did the initial backups on both with

Windows VM - Initial backup
INFO: backup was done incrementally, reused 168.31 GiB (67%)
INFO: transferred 250.00 GiB in 663 seconds (386.1 MiB/s)

Redhat Linux VM - Initial Backup
INFO: backup was done incrementally, reused 77.01 GiB (17%)
INFO: transferred 452.00 GiB in 2251 seconds (205.6 MiB/s)

I then did a secondary backup with no major changes in the VM's hoping that it wouldn't take long as there is no changes to write.

Windows VM
INFO: backup was done incrementally, reused 250.00 GiB (100%)
INFO: transferred 250.00 GiB in 280 seconds (914.3 MiB/s)

Redhat Linux VM
INFO: backup was done incrementally, reused 452.00 GiB (100%)
INFO: transferred 452.00 GiB in 1147 seconds (403.5 MiB/s

Is this normal performance for spinning disks and am i better off looking towards enterprise ssd's
 
  • Like
Reactions: Johannes S
Sorry, i was restoring the 2 vm's i have as i was writing the previous post.

See attached restore logs

VM102 is the Linux Redhat 452gb VM
VM103 is the windows 10 VM
 

Attachments

  • Like
Reactions: _gabriel
PBS is, by design, really fast to do backup and regular/multi/daily backups,
because already existing data in PBS datastore disk isn't backup twice.
Data physically exists in single copy in PBS datastore.
Backup Data is stored physically as many tiny chunks files scattered on PBS datastore disk,
then restore can't write "sequentially" from these, so it's slower, even more from spinning disks.
EDIT: There is "Live restore" which allow boot and use VM during restoration, but VM running slower during the process.

IMO, your numbers are excepted for spinning disks PBS datastore, for your 450 GB VM :
1st restore is 3330 seconds ( ~56 minutes )
2nd restore is 6066 seconds ( ~1h40 )
 
Last edited:
  • Like
Reactions: Johannes S