PVE Backup Speed Optimization

@LnxBil - Well, I would disagree that 100MB/s is fast - with my 10G network I would expect at least double that, maybe better...

But, maybe I'm looking at the wrong thing. It's not at all clear why I would see this asymmetry. The Images are stored on this same SAN network as the backups are going to for 1, and secondarily I would've expected (maybe wrongly) that the running image was stored in ram and wouldn't be read from the local HOST disk nor the Image store (on NFS) before backup (to NFS). These are all enterprise class (12gen Dell T620/720) servers with lots of ram (192G) and some local storage, but all of the VM data (images and backups) is on FreeNAS for HA functionality.

Also, the migrate function reports MUCH higher throughputs (see above - it's often over 800MB/s), which I would expect to be pulling from and writing to the FreeNAS Images store for both Nodes if that's the way VZDUMP works. And, as I show above, the GUEST VM can copy at over 200MB/s, and that's reading and writing to the FreeNAS shares simultaneously for sure.

So, as a test, I put the backup on different FreeNAS store so that the reads (Images are on FreeNAS #2) and writes (Backups are going to FreeNAS #1) are going to different NFS shares on the 10G SAN (101) network.

Incredibly at some points I'm seeting 0 KB/s read speeds, which I assume means it didn't need to read much, because I can't imagine such a low number, or why it wouldn't just fail? Clearly there's a fair amount about this I don't understand.

FN=FreeNAS host 191.168.101.101
FN=FreeNAS host 191.168.101.102
Code:
INFO: starting new backup job: vzdump 505 --storage FN1_Backup --node svr-04 --compress 0 --remove 0 --mode snapshot
INFO: Starting Backup of VM 505 (qemu)
INFO: Backup started at 2020-03-10 18:35:50
INFO: status = running
INFO: update VM 505: -lock backup
INFO: VM Name: guacamole
INFO: include disk 'scsi0' 'FN2_IMAGES:505/vm-505-disk-0.qcow2' 100G
INFO: backup mode: snapshot
INFO: bandwidth limit: 9,000,000 KB/s
INFO: ionice priority: 0
INFO: creating archive '/mnt/pve/FN1_Backup/dump/vzdump-qemu-505-2020_03_10-18_35_50.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '50eea5ca-9f0d-438c-97d2-b172d306aafd'
INFO: status: 0% (279248896/107374182400), sparse 0% (106946560), duration 3, read/write 93/57 MB/s
INFO: status: 1% (1114177536/107374182400), sparse 0% (257196032), duration 14, read/write 75/62 MB/s
INFO: status: 2% (2214133760/107374182400), sparse 0% (1038684160), duration 26, read/write 91/26 MB/s
INFO: status: 3% (3261464576/107374182400), sparse 1% (2061148160), duration 38, read/write 87/2 MB/s
INFO: status: 4% (4299751424/107374182400), sparse 2% (3094855680), duration 48, read/write 103/0 MB/s
INFO: status: 5% (5441126400/107374182400), sparse 3% (4225875968), duration 59, read/write 103/0 MB/s
INFO: status: 6% (6483542016/107374182400), sparse 4% (4895895552), duration 70, read/write 94/33 MB/s
INFO: status: 7% (7590445056/107374182400), sparse 4% (5077061632), duration 87, read/write 65/54 MB/s
INFO: status: 8% (8676573184/107374182400), sparse 4% (5221580800), duration 100, read/write 83/72 MB/s
INFO: status: 9% (9677307904/107374182400), sparse 4% (5326868480), duration 114, read/write 71/63 MB/s
INFO: status: 10% (10803150848/107374182400), sparse 5% (5545205760), duration 130, read/write 70/56 MB/s
INFO: status: 11% (11881283584/107374182400), sparse 5% (5668978688), duration 145, read/write 71/63 MB/s
INFO: status: 12% (12910067712/107374182400), sparse 5% (5974421504), duration 159, read/write 73/51 MB/s
INFO: status: 13% (14004125696/107374182400), sparse 6% (6759571456), duration 172, read/write 84/23 MB/s
INFO: status: 14% (15057420288/107374182400), sparse 7% (7810551808), duration 182, read/write 105/0 MB/s
INFO: status: 15% (16203579392/107374182400), sparse 8% (8914354176), duration 193, read/write 104/3 MB/s
INFO: status: 16% (17265065984/107374182400), sparse 9% (9967771648), duration 203, read/write 106/0 MB/s
INFO: status: 17% (18291687424/107374182400), sparse 10% (10939265024), duration 213, read/write 102/5 MB/s
INFO: status: 18% (19344392192/107374182400), sparse 11% (11989024768), duration 223, read/write 105/0 MB/s
 
@LnxBil - Well, I would disagree that 100MB/s is fast - with my 10G network I would expect at least double that, maybe better...

I'm not sure if you get my intended point: If you only read at max of 100 MB/s, writing at max 100 MB/s is fast. The problem is reading, why is that sooo slow.

But, maybe I'm looking at the wrong thing. It's not at all clear why I would see this asymmetry.

Asymmetry is normal. vzdump omits empty blocks, so the write stats are always lower than the read stats. This even increases if you use compression.

The Images are stored on this same SAN network as the backups are going to for 1, and secondarily I would've expected (maybe wrongly) that the running image was stored in ram and wouldn't be read from the local HOST disk nor the Image store (on NFS) before backup (to NFS).

Caching is ok, but normally you would not cache a whole disk, you simple do not have the RAM for that. Generally, a storage is is orders of magnitude larger than the amount of RAM you use. Also, only (and also maybe) the recently read stuff is in cache, not the whole disk image, so you have to read a lot.

I'm unsure, are you reading from one storage and writing to another?

These are all enterprise class (12gen Dell T620/720) servers with lots of ram (192G) and some local storage, but all of the VM data (images and backups) is on FreeNAS for HA functionality.

So you run glusterfs with carp on two FreeNAS boxes or how is your HA setup on the storage side?


Also, the migrate function reports MUCH higher throughputs (see above - it's often over 800MB/s), which I would expect to be pulling from and writing to the FreeNAS Images store for both Nodes if that's the way VZDUMP works.

Theoretical network throughput and writing throughput are two different things. If you do a random write test of 4K blocks, you will see a significantly lower throughtput rate. Have you already introduced your san envrionment? What filesystem, what RAID or ZFS layout, how many disks? All this determines the maximum disk throughput in a 10 GBE setup, not the 10 GBE.

Incredibly at some points I'm seeting 0 KB/s read speeds, which I assume means it didn't need to read much, because I can't imagine such a low number, or why it wouldn't just fail?

[...]
Code:
INFO: starting new backup job: vzdump 505 --storage FN1_Backup --node svr-04 --compress 0 --remove 0 --mode snapshot
INFO: Starting Backup of VM 505 (qemu)
INFO: Backup started at 2020-03-10 18:35:50
INFO: status = running
INFO: update VM 505: -lock backup
INFO: VM Name: guacamole
INFO: include disk 'scsi0' 'FN2_IMAGES:505/vm-505-disk-0.qcow2' 100G
INFO: backup mode: snapshot
INFO: bandwidth limit: 9,000,000 KB/s
INFO: ionice priority: 0
INFO: creating archive '/mnt/pve/FN1_Backup/dump/vzdump-qemu-505-2020_03_10-18_35_50.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '50eea5ca-9f0d-438c-97d2-b172d306aafd'
INFO: status: 0% (279248896/107374182400), sparse 0% (106946560), duration 3, read/write 93/57 MB/s
INFO: status: 1% (1114177536/107374182400), sparse 0% (257196032), duration 14, read/write 75/62 MB/s
INFO: status: 2% (2214133760/107374182400), sparse 0% (1038684160), duration 26, read/write 91/26 MB/s
INFO: status: 3% (3261464576/107374182400), sparse 1% (2061148160), duration 38, read/write 87/2 MB/s
INFO: status: 4% (4299751424/107374182400), sparse 2% (3094855680), duration 48, read/write 103/0 MB/s
INFO: status: 5% (5441126400/107374182400), sparse 3% (4225875968), duration 59, read/write 103/0 MB/s
INFO: status: 6% (6483542016/107374182400), sparse 4% (4895895552), duration 70, read/write 94/33 MB/s
INFO: status: 7% (7590445056/107374182400), sparse 4% (5077061632), duration 87, read/write 65/54 MB/s
INFO: status: 8% (8676573184/107374182400), sparse 4% (5221580800), duration 100, read/write 83/72 MB/s
INFO: status: 9% (9677307904/107374182400), sparse 4% (5326868480), duration 114, read/write 71/63 MB/s
INFO: status: 10% (10803150848/107374182400), sparse 5% (5545205760), duration 130, read/write 70/56 MB/s
INFO: status: 11% (11881283584/107374182400), sparse 5% (5668978688), duration 145, read/write 71/63 MB/s
INFO: status: 12% (12910067712/107374182400), sparse 5% (5974421504), duration 159, read/write 73/51 MB/s
INFO: status: 13% (14004125696/107374182400), sparse 6% (6759571456), duration 172, read/write 84/23 MB/s
INFO: status: 14% (15057420288/107374182400), sparse 7% (7810551808), duration 182, read/write 105/0 MB/s
INFO: status: 15% (16203579392/107374182400), sparse 8% (8914354176), duration 193, read/write 104/3 MB/s
INFO: status: 16% (17265065984/107374182400), sparse 9% (9967771648), duration 203, read/write 106/0 MB/s
INFO: status: 17% (18291687424/107374182400), sparse 10% (10939265024), duration 213, read/write 102/5 MB/s
INFO: status: 18% (19344392192/107374182400), sparse 11% (11989024768), duration 223, read/write 105/0 MB/s

Where is the 0 MB/s read speed in this output? You have 0 MB/sec write speed and that is normal is there is no data (as the sparse %-indicator suggests).
 
Yea, I misspoke when I said read, the 0 MB/s is on the write side. Nonetheless, I find the 0 baffling as I have no indication that the system cannot do much better than this, so if you can propose a test or troubleshooting step that would be appreciated. I try DD below with good results.

But to be clear. All storage for the VMs are on 3 independent FreeNAS boxes (192.168.101.101/2/4 - FN=FreeNas 1, FN2=FreeNAS 2, etc.). All Cluster communication is on 192.168.100.0/24.

Those boxes are setup with ZFS pools of 4 VDEVs of 3 disks, with NFS Shares for Images, ISOs, and Backup/Dump files to ProxMox.

In the last backup test I verified that the running VM Images are a different physical box from the backups. Since all reads and writes are to/from these FreeNAS pools on the storage network (192.168.101.0/24) the screenshot of my VM reading and writing to them at 200MB/s I thought indicated that the storage can do these speeds.

So, again, not sure this (DD shown below) is the proper test, but I think this shows the drives can do nearly 400MB/s at the OS level. Since these shares are on different servers, and are not saturating the 10G network, I would expect to be able to use the full speed of both. But maybe there's a better test to run? Based on this I would expect 3-4x better backup performance than I'm getting.

EDIT: I also ran the transfers the other way, just to be sure, which gave much better results. I probably did something wrong, but here's the data.

root@svr-04:~# dd if=/dev/zero of=/mnt/pve/FN2_IMAGES/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.69206 s, 399 MB/s
root@svr-04:~# dd if=/dev/zero of=/mnt/pve/FN1_Backup/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.93287 s, 366 MB/s
root@svr-04:~#

root@svr-04:~# dd if=/mnt/pve/FN1_Backup/test1.img of=/dev/zero bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.799644 s, 1.3 GB/s
root@svr-04:~# dd if=/mnt/pve/FN1_Backup/test1.img of=/dev/zero bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.818886 s, 1.3 GB/s
root@svr-04:~#
 
Last edited:
But to be clear. All storage for the VMs are on 3 independent FreeNAS boxes (192.168.101.101/2/4 - FN=FreeNas 1, FN2=FreeNAS 2, etc.). All Cluster communication is on 192.168.100.0/24.

Is there some HA between the boxes? If not, you have a single-point-of-failure (SPOF), which is the total opposite of HA.

Those boxes are setup with ZFS pools of 4 VDEVs of 3 disks, with NFS Shares for Images, ISOs, and Backup/Dump files to ProxMox.

Okay, so sequential throughput should be able to handle it. I assume raidz1 in each vdev? Problem is, that you will not have sequential read from ZFS, due to the fact, that data does not have to lay near each other. There will be sequential readable data, but overall with snapshots etc, the data is scattered so that you will not read fast from ZFS due to fragmentation. What is your fragmentation of the pool?

In the last backup test I verified that the running VM Images are a different physical box from the backups. Since all reads and writes are to/from these FreeNAS pools on the storage network (192.168.101.0/24) the screenshot of my VM reading and writing to them at 200MB/s I thought indicated that the storage can do these speeds.

Throughput alone is not what you have to look for. It only applies to sequential data, random data will be terribly, terribly slow in comparison to that.

So, again, not sure this (DD shown below) is the proper test, but I think this shows the drives can do nearly 400MB/s at the OS level. Since these shares are on different servers, and are not saturating the 10G network, I would expect to be able to use the full speed of both. But maybe there's a better test to run? Based on this I would expect 3-4x better backup performance than I'm getting.

Code:
root@svr-04:~# dd if=/dev/zero of=/mnt/pve/FN2_IMAGES/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.69206 s, 399 MB/s
root@svr-04:~# dd if=/dev/zero of=/mnt/pve/FN1_Backup/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.93287 s, 366 MB/s
root@svr-04:~#

Unfortunately, those writes are not hitting the disk. ZFS compresses (you have compression enabled??) and zeros can be compressed very well.
Here an example:
Code:
root@host ~ > dd if=/dev/zero of=/test/testfile bs=1G count=4 oflag=dsync
4+0 Datensätze ein
4+0 Datensätze aus
4294967296 Bytes (4,3 GB, 4,0 GiB) kopiert, 4,89601 s, 877 MB/s

root@host ~ > zfs list zpool/test
NAME         USED  AVAIL  REFER  MOUNTPOINT
zpool/test    96K   155G    96K  /test

root@host ~ > ls -lh /test
insgesamt 512
-rw-r--r-- 1 root root 4,0G Mär 11 16:23 testfile

There is only one crappy consumer SATA SSD 6GBit, which has a theoretical limit of 750 MB/s (practically approx. 500 MB/s) and I write faster than the drive. The file is present with 4 GB, but the dataset itself only takes up 96K of space.

Also, you read with a blocksize of 1G, vzdump reads 4K, because you backup a guest that has normally a blocksize of 4K. Only with this, it can detect zeros and omit it.
 
Silly question but have you used iftop when the backup is running to make sure it's using the correct nic and your not using a 1Gbps link some where ?

(sorry i did try and read your post but its quite a wall of text)
 
Is there some HA between the boxes? If not, you have a single-point-of-failure (SPOF), which is the total opposite of HA.

Okay, so sequential throughput should be able to handle it. I assume raidz1 in each vdev? Problem is, that you will not have sequential read from ZFS, due to the fact, that data does not have to lay near each other. There will be sequential readable data, but overall with snapshots etc, the data is scattered so that you will not read fast from ZFS due to fragmentation. What is your fragmentation of the pool?

Throughput alone is not what you have to look for. It only applies to sequential data, random data will be terribly, terribly slow in comparison to that.

Unfortunately, those writes are not hitting the disk. ZFS compresses (you have compression enabled??) and zeros can be compressed very well.

There is only one crappy consumer SATA SSD 6GBit, which has a theoretical limit of 750 MB/s (practically approx. 500 MB/s) and I write faster than the drive. The file is present with 4 GB, but the dataset itself only takes up 96K of space.

HA on FreeNAS boxes - No, each FreeNAS box can fail, and Yes, it is a SPOF, my HA setup is only for the VMs within ProxMox. It is still a homelab. I do rsync the Images between boxes for backup, but I don't have enough storage to replicate all of my data between the boxes, and I'm not currently going to try to setup CEPH or something else that I'm not familiar with. But in my current config I could bring the VMS back from a secondary/backup Images store and run everything but my Plex with minimal downtime.

Compression, ZFS, etc. LZ4 compression is enabled on the FreeNAS box. I also showed that the VM can read/write "real" data over the network at ~200MB/s, and my MAC can write at 500MB/s to the FreeNAS box. So although this is an interesting, it doesn't define a path for improving it. If you have a better test you can recommend I'm all ears.

The VM read/write is real-world performance from an share on FreeNAS to the Images NFS share on FreeNAS of a large movie file, and it outperforms the backup process. Here are 2 screenshots reading from the FreeNAS share, the second is from a VM, so it's reading and writing at 180 MB/s througput, so it seems that would be at least a good target that should be achievable for backups.

I'm also about to field a FreeNAS box with only SSD drives for the Images, which should have better read/write for the Images share, and I would expect to be able to get closer to the write limit to the Backup share. But would you have a different recommendation to get my backup performance up? I'm generally pretty happy with my PVE performance, but backups are a little painful, especially for my NextCloud instance.
 

Attachments

  • server_04_read.jpg
    server_04_read.jpg
    13.4 KB · Views: 21
  • server_04_read_write.jpg
    server_04_read_write.jpg
    15.8 KB · Views: 19
The VM read/write is real-world performance from an share on FreeNAS to the Images NFS share on FreeNAS of a large movie file, and it outperforms the backup process. Here are 2 screenshots reading from the FreeNAS share, the second is from a VM, so it's reading and writing at 180 MB/s througput, so it seems that would be at least a good target that should be achievable for backups.

you can't compare with big file copy, because backup copy by small block of 64k max. (so try to copy a lot of 64k files sequentially + sync (without buffering), and check the results)
 
OK, that's worth looking at. When I do that I get much slower writes than reads (skewed the same as the backup but different magnitudes-by about 2x). The writes were 50-80MB/s, but the reads were near 200MB/s. This test was with ~1,000 small eBook files that made up about a 5GB transfer.

So, is the concenus that 100-125MB/s is the best one can expect backing up to SATA spinning drives in a 'typical' SOHO NAS configuration?
 

Attachments

  • read.jpg
    read.jpg
    22.5 KB · Views: 17
  • write.jpg
    write.jpg
    14.2 KB · Views: 15
So, is the concenus that 100-125MB/s is the best one can expect backing up to SATA spinning drives in a 'typical' SOHO NAS configuration?

Again: Writing a backup is sequential, reading on the other hand is sometimes sequential and can be random, so the input stream is the problem, not the writing.

What about the test to dd an image from one Nas to the other from the PVE node? That'll give you an example of the "raw" performance coping and that will give you the baseline.
 
its unforunate there hasnt been more information on this. I just added a 10gig nic and also would like to use it for backups to my nfs share on Unraid. only getting about 50-70/50-70mb/s until the end then it reads 200+ for the last few percent. I want to check my data base file to ensure its using the right nic for backups. it should be since its a separate subnet from eno0
 
>only getting about 50-70/50-70mb/s

@lxsharpxl, you may also follow this one https://forum.proxmox.com/threads/may-zstd-rsyncable-be-a-handbrake-for-backup.124587/

i have 10gigE network and fast cifs storage for backup and also often observe backup speeds mostly at such speed <100Mb/s, where i would expect more.

nevertheless, it's not a problem for me yet because it's in a cluster and of all servers average >>50Mb/s then the backup window is large enough.
anyhow, if we have data growth i'd like to be on the safe side. backup can't be fast enough....
 
Posting for posterity.

backup speeds are comprised of the SLOWEST of:

disk read speed -> stream compression -> available network bandwidth (sender) -> available network bandwidth (receiver) -> disk write speed.

reading through the above, seems like the only thing folks seem to look at is the synthetic network bandwidth. need to check ALL the above.
 
  • Like
Reactions: LnxBil
After looking at this in my installation (40 GIB network) I'm under the impression that the IO speed / max IOPS of the boot drive can limit the overall maximum performance. I max out at about 100 MBytes/s for small files and 350 for huge files to backup. The data source to backup can supply several GBytes/s.
Is ALL data first written to the local boot disk before it is written to our attached ZFS disk array? Currently I'm using an old spinning disk as boot disk. When I would replace the boot disk with a NVME would this improve overall backup performance? I initially assumed that the boot drive is just used for booting up and that it would not influence the overall performance... but now I see a lot of IOPS in the graph of the web GUI under the "Administration" tab as soon as a backup is running...

Checking with the fatrace tool to monitor all file system activity of the root drive I observed a ton of high frequency I/O in this directory: /var/log/proxmox-backup/tasks/

There is tons of I/O but only little data - on my setup only a few dozen Megabytes.
Maybe this could be put on a RAM disk as tasks would die on a reboot anyway?
Or does this have to be persistently stored - probably on some small Optane disk?
 
Last edited:
After looking at this in my installation (40 GIB network) I'm under the impression that the IO speed / max IOPS of the boot drive can limit the overall maximum performance. I max out at about 100 MBytes/s for small files and 350 for huge files to backup. The data source to backup can supply several GBytes/s.
Is ALL data first written to the local boot disk before it is written to our attached ZFS disk array? Currently I'm using an old spinning disk as boot disk. When I would replace the boot disk with a NVME would this improve overall backup performance? I initially assumed that the boot drive is just used for booting up and that it would not influence the overall performance... but now I see a lot of IOPS in the graph of the web GUI under the "Administration" tab as soon as a backup is running...
no, normally not. what increase in iops do you observe there during backup ?
When backing up the root disk is fully saturated. See attached screenshot. At the beginning of the graph a backup is still active. As soon as it finishes the root disk is almost back to idle.
 

Attachments

  • Bildschirmfoto 2023-03-28 um 09.33.50.png
    Bildschirmfoto 2023-03-28 um 09.33.50.png
    596.3 KB · Views: 14
When backing up the root disk is fully saturated. See attached screenshot. At the beginning of the graph a backup is still active. As soon as it finishes the root disk is almost back to idle.

Checking with the fatrace tool to monitor all file system activity of the root drive I observed a ton of high frequency I/O in this directory: /var/log/proxmox-backup/tasks/

There is tons of I/O but only little data - on my setup only a few dozen Megabytes.
To mee it seems every chuck is first uploaded there and after temp storage then transferred away from the root disk.

Maybe this could be put on a RAM disk as tasks would die on a reboot anyway?
Or does this have to be persistently stored - probably on some small Optane disk?
 
When backing up the root disk is fully saturated. See attached screenshot. At the beginning of the graph a backup is still active. As soon as it finishes the root disk is almost back to idle.
Can you please cross check this on your end - click in the web GUI on the left side on "Administration" and then scroll down to the statistics of the root disk. While backing up the root disk is fully occupied with IO.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!