Proxmox VM Restore Speed

lail8291

New Member
May 23, 2023
3
0
1
Hi

I know this subject has been brought up a couple of times already. But even then I've never managed to find a solution in all the different discussions.

First I'll explain the Setup.

I have a CEPH Cluster consisting of three nodes and one PBS node.
  • The CEPH Cluster consists of fairly new hardware with fast NVMe based SSD disks.
  • The PBS Node consists of one older generation server consisting of 12x 12 TB SAS disks in a RAID10 Setup with vdev devices set up.

I've done some extensive performance testing from the CEPH Cluster to the PBS node and vice versa. What I've noticed though were slow restore speeds. While I can reach up to 8 Gbit/s transfer rates while doing backups from the Ceph Cluster, provided I backup with three nodes at the same time, during restores I only reach around 2.3 Gbit/S at most. Doing single VM recovery on a cluster I only reach around 800 Mbit/s transfer speeds.

Currently I'm running PVE8 and PBS3, but the versions odn't really matter. I've experienced this same behaviour with PVE7 and PBS3 and all the different sub-versions. Also, I'm aware that SSDs are better and also the recommended disks for the PBS, but even with SSDs I reach unsatisfactory results.

I hope that there are some ways to improve the restoration speed to sensible levels.

Thanks for the help and kind regards

Lail
 
Here are some additional informations:
## proxmox-backup-client benchmark on a Ceph Cluster Node
Code:
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 644.16 MB/s (52%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1826.81 MB/s (90%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 659.95 MB/s (88%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 914.51 MB/s (76%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 607.37 MB/s (80%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2044.35 MB/s (56%) │
└───────────────────────────────────┴────────────────────┘

## zpool status on the PBS
Code:
  pool: backup-hdd
 state: ONLINE
config:

    NAME                              STATE     READ WRITE CKSUM
    backup-hdd                        ONLINE       0     0     0
      mirror-0                        ONLINE       0     0     0
        scsi-35000cca27816f040        ONLINE       0     0     0
        scsi-35000cca278146218        ONLINE       0     0     0
      mirror-1                        ONLINE       0     0     0
        scsi-35000cca278100c90        ONLINE       0     0     0
        scsi-35000cca278076f8c        ONLINE       0     0     0
      mirror-2                        ONLINE       0     0     0
        scsi-35000cca27815ae74        ONLINE       0     0     0
        scsi-35000cca27816f27c        ONLINE       0     0     0
      mirror-3                        ONLINE       0     0     0
        scsi-35000cca27815ed50        ONLINE       0     0     0
        scsi-35000cca278146760        ONLINE       0     0     0
      mirror-4                        ONLINE       0     0     0
        scsi-35000cca278153d7c        ONLINE       0     0     0
        scsi-35000cca27816ab04        ONLINE       0     0     0
      mirror-5                        ONLINE       0     0     0
        scsi-35000cca27816e634        ONLINE       0     0     0
        scsi-35000cca278054580        ONLINE       0     0     0
    special  
      mirror-6                        ONLINE       0     0     0
        scsi-35000cca0a60045c0-part1  ONLINE       0     0     0
        scsi-35000cca0a6004bd0-part1  ONLINE       0     0     0
      mirror-7                        ONLINE       0     0     0
        scsi-35000cca0a600cbcc-part1  ONLINE       0     0     0
        scsi-35000cca0a600d5cc-part1  ONLINE       0     0     0
    cache
      scsi-35000cca0a60045c0-part3    ONLINE       0     0     0
      scsi-35000cca0a6004bd0-part3    ONLINE       0     0     0
      scsi-35000cca0a600cbcc-part3    ONLINE       0     0     0
      scsi-35000cca0a600d5cc-part3    ONLINE       0     0     0


errors: No known data errors

While the SAS HDDs have aged a bit, they still pack quite the punch regarding performance. I haven't yet checked the IO Wait times of the disks, but I highly doubt I should have high IO wait considering I have only 6 backups and only 3 systems are simultaneously trying to accesses that zpool.


In total I've used 6 VMs to test the backup and restore performances. After each backup I've run a small script generate random data the size of 300G. Thus CEPH PG reusage (on the CEPH Cluster side) and ZFS Cache usage is mitigate when I restore the same Backup twice.

## Restore Performance Raid10 12x 12TB SAS HDD

### Restore Performance of 3 VMs

1695190550956.png
(Reads per second of the disks during restore of three VMs on three nodes)

1695190642300.png
(Transfer speed on the interfaces of the three nodes)

1695190667213.png
(Transfer speed on the interface of the PBS)


### Restore Performance of 1 VM
1695190759772.png
(Reads per second of the disks during restore of three VMs on three nodes)

1695190769666.png
(Transfer speed on the interfaces of the three nodes)
1695190775358.png
(Transfer speed on the interface of the PBS)


# Further testing
I've also mentioned that I've done some testing with SSDs. I've created a Raid10 Zpool and a Raidz6 Zpool with 6 SSDs. They are simply Sata SSDs. But they still provide better transfer speeds. But that's besides the point. Even these Sata SSDs provide disappointing results:

## Restore Performance Raidz2 6x Sata SSD

### Restore performance of 3 VMs
1695190955519.png
(Reads per second of the disks during restore of three VMs on three nodes)
1695190962335.png
(Transfer speed on the interfaces of the three nodes)
1695190966851.png
(Transfer speed on the interface of the PBS)
 

Attachments

  • 1695190620674.png
    1695190620674.png
    15 KB · Views: 0
Last edited:
Appendum

### Restore performance 1 VM
1695191058643.png
(Reads per second of the disks during restore of three VMs on three nodes)
1695191063884.png
(Transfer speed on the interfaces of the three nodes)
1695191068386.png
(Transfer speed on the interface of the PBS)

Even approximately 270 MB/s of transfer speed is fairly bad. Considering I have bandwith to spare.

Any suggestions on what can be done here or what I can improve? My main goal is to have a running PBS with my SAS HDDs. I know they aren't ideal, but they still pack a punch. In regards to Backup speed they haven't disappointed. While not really too important to my issue, here are some backup performance values to my Raid10 with 12 SAS HDDs:

## Backup performance Raid10 12x 12TB SAS HDD
### Backup performance 3 VMs
1695191302182.png
(Reads per second of the disks during backup on the PBS)
1695191306648.png
{Transfer speed on the interfaces of the three nodes)
1695191311030.png
(Transfer speed on the interface of the PBS)
 
While the SAS HDDs have aged a bit, they still pack quite the punch regarding performance. I haven't yet checked the IO Wait times of the disks, but I highly doubt I should have high IO wait considering I have only 6 backups and only 3 systems are simultaneously trying to accesses that zpool.
Yes, SAS / Enterprise disks have a more consistent read time and will fail/report errors faster than consumer disks if there is a problem reading in time.

Have you compared your restore times with full-clone times of the same VM? Would be intersting to see the comparison.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!