Miserable backup speeds

Republicus

Well-Known Member
Aug 7, 2017
137
20
58
40
I am getting miserable speeds while doing backups.

My backup-storage is NFS over RDMA.
Speeds while writing to the NFS shared storage is much much better than VZDUMP.

I tested writing directly to backup-storage, twice system memory:

Code:
root@node02:/mnt/pve/backup-storage# time dd if=/dev/zero of=testfile bs=4096k count=49k status=progress
209769725952 bytes (210 GB, 195 GiB) copied, 199 s, 1.1 GB/s
50176+0 records in
50176+0 records out
210453397504 bytes (210 GB, 196 GiB) copied, 205.873 s, 1.0 GB/s

real    3m36.317s
user    0m0.168s
sys     3m29.176s

These results are consistent.

I tried increasing (adding) bwlimit: 409600 to /etc/vzdump.conf with no difference.
I have also tried different compression algorithms between ZSTD and LZ4.

Still, my backups are running very slow.

Code:
INFO: starting new backup job: vzdump 70001 --compress zstd --storage backup-storage --remove 0 --mode snapshot --node node02
INFO: Starting Backup of VM 70001 (qemu)
INFO: Backup started at 2020-05-24 14:41:57
INFO: status = stopped
INFO: backup mode: stop
INFO: bandwidth limit: 409600 KB/s
INFO: ionice priority: 7
INFO: VM Name: opnsense
INFO: include disk 'virtio0' 'nfs-storage:70001/vm-70001-disk-0.qcow2' 8G
INFO: creating archive '/mnt/pve/backup-storage/dump/vzdump-qemu-70001-2020_05_24-14_41_57.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task 'b0f1e2bc-1f1b-4031-84f2-9d821bff634d'
INFO: status: 3% (260702208/8589934592), sparse 2% (240578560), duration 3, read/write 86/6 MB/s
INFO: status: 4% (421920768/8589934592), sparse 2% (241004544), duration 6, read/write 53/53 MB/s
INFO: status: 6% (571342848/8589934592), sparse 2% (241512448), duration 9, read/write 49/49 MB/s
INFO: status: 9% (809566208/8589934592), sparse 5% (431808512), duration 12, read/write 79/15 MB/s
INFO: status: 11% (1016856576/8589934592), sparse 6% (524763136), duration 15, read/write 69/38 MB/s
INFO: status: 13% (1197342720/8589934592), sparse 6% (527282176), duration 18, read/write 60/59 MB/s
INFO: status: 16% (1394933760/8589934592), sparse 7% (622862336), duration 21, read/write 65/34 MB/s
INFO: status: 17% (1528299520/8589934592), sparse 7% (623972352), duration 24, read/write 44/44 MB/s
INFO: status: 19% (1680670720/8589934592), sparse 7% (625827840), duration 27, read/write 50/50 MB/s
INFO: status: 21% (1844445184/8589934592), sparse 7% (626061312), duration 30, read/write 54/54 MB/s
INFO: status: 23% (2052587520/8589934592), sparse 8% (720400384), duration 33, read/write 69/37 MB/s
INFO: status: 25% (2214461440/8589934592), sparse 8% (722677760), duration 36, read/write 53/53 MB/s
INFO: status: 28% (2405957632/8589934592), sparse 9% (817049600), duration 39, read/write 63/32 MB/s
INFO: status: 31% (2679767040/8589934592), sparse 12% (1078583296), duration 42, read/write 91/4 MB/s
INFO: status: 33% (2846621696/8589934592), sparse 12% (1079533568), duration 45, read/write 55/55 MB/s
INFO: status: 35% (3053518848/8589934592), sparse 14% (1204465664), duration 48, read/write 68/27 MB/s
INFO: status: 38% (3315466240/8589934592), sparse 16% (1457651712), duration 51, read/write 87/2 MB/s
INFO: status: 40% (3504603136/8589934592), sparse 17% (1473257472), duration 54, read/write 63/57 MB/s
INFO: status: 43% (3698458624/8589934592), sparse 17% (1509433344), duration 57, read/write 64/52 MB/s
INFO: status: 45% (3893428224/8589934592), sparse 18% (1550073856), duration 60, read/write 64/51 MB/s
INFO: status: 48% (4166582272/8589934592), sparse 21% (1822359552), duration 63, read/write 91/0 MB/s
INFO: status: 50% (4340842496/8589934592), sparse 21% (1826738176), duration 66, read/write 58/56 MB/s
INFO: status: 52% (4515692544/8589934592), sparse 21% (1827700736), duration 69, read/write 58/57 MB/s
INFO: status: 54% (4694343680/8589934592), sparse 21% (1830658048), duration 72, read/write 59/58 MB/s
INFO: status: 56% (4880596992/8589934592), sparse 21% (1855746048), duration 75, read/write 62/53 MB/s
INFO: status: 58% (5030281216/8589934592), sparse 21% (1858248704), duration 78, read/write 49/49 MB/s
INFO: status: 60% (5188026368/8589934592), sparse 21% (1867436032), duration 81, read/write 52/49 MB/s
INFO: status: 62% (5368643584/8589934592), sparse 22% (1936478208), duration 84, read/write 60/37 MB/s
INFO: status: 65% (5601427456/8589934592), sparse 24% (2076975104), duration 87, read/write 77/30 MB/s
INFO: status: 67% (5798887424/8589934592), sparse 24% (2097860608), duration 90, read/write 65/58 MB/s
INFO: status: 70% (6081216512/8589934592), sparse 27% (2369134592), duration 93, read/write 94/3 MB/s
INFO: status: 73% (6278348800/8589934592), sparse 28% (2450464768), duration 96, read/write 65/38 MB/s
INFO: status: 74% (6439043072/8589934592), sparse 28% (2451136512), duration 99, read/write 53/53 MB/s
INFO: status: 76% (6606290944/8589934592), sparse 28% (2453364736), duration 102, read/write 55/55 MB/s
INFO: status: 78% (6751715328/8589934592), sparse 28% (2454994944), duration 105, read/write 48/47 MB/s
INFO: status: 80% (6917980160/8589934592), sparse 28% (2485792768), duration 108, read/write 55/45 MB/s
INFO: status: 82% (7066222592/8589934592), sparse 28% (2487078912), duration 111, read/write 49/48 MB/s
INFO: status: 84% (7287668736/8589934592), sparse 30% (2624225280), duration 114, read/write 73/28 MB/s
INFO: status: 88% (7566655488/8589934592), sparse 33% (2903121920), duration 117, read/write 92/0 MB/s
INFO: status: 91% (7852654592/8589934592), sparse 37% (3189116928), duration 120, read/write 95/0 MB/s
INFO: status: 94% (8111390720/8589934592), sparse 39% (3427225600), duration 123, read/write 86/6 MB/s
INFO: status: 97% (8395816960/8589934592), sparse 43% (3711651840), duration 126, read/write 94/0 MB/s
INFO: status: 100% (8589934592/8589934592), sparse 45% (3905671168), duration 129, read/write 64/0 MB/s
INFO: transferred 8589 MB in 129 seconds (66 MB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 1.92GB
INFO: Finished Backup of VM 70001 (00:02:17)
INFO: Backup finished at 2020-05-24 14:44:14
INFO: Backup job finished successfully
TASK OK

Using RDMA on both storage and node CPU/IO are good numbers during backup.
 

Attachments

  • 1590345431918.png
    1590345431918.png
    16.3 KB · Views: 7
Using dd for performance benchmark isnt working out great especially not with /dev/zero due to possible compression or caching.

As you max out 1GB there is some cache or compression going on. But no data is really written on the disks.



What do you get by using "cat /dev/urandom | pv -a > testfile" (run "apt install pv" before if needed)

It should start fast end then go down once your storage cache is exhausted.

Let it run for 1 minute and then abort with ctrl+c

It will print you the average mb/s for a realistic use case. Thats the maxium you would get for the backup job.


You can repeat the test with /dev/zero to see better results when compression/caching is on.

remove the test file after "rm testfile"
 
Last edited:
Seems your VM storage is also on NFS. So you are transferring data twice over the same network?
 
Thanks! Seems everyone is right.

Since @H4R0 gave his suggestion, I am seeing numbers I didn't expect to see on my ZFS mirrored vdevs storage. I must be doing something wrong and will now have to dissect my storage systems because I would expect much better performance than a single disk in a raid similar to raid 10 or 50 in ZFS. My bubble has been burst :( Sadly, similar numbers were seen locally on the SAN/NAS

@dietmar is also right. Though currently on a 40Gbe storage connected network, it is my intention to setup a subnet on my other available 40Gbe interface ports for backup comms only.
 
Thanks! Seems everyone is right.

Since @H4R0 gave his suggestion, I am seeing numbers I didn't expect to see on my ZFS mirrored vdevs storage. I must be doing something wrong and will now have to dissect my storage systems because I would expect much better performance than a single disk in a raid similar to raid 10 or 50 in ZFS. My bubble has been burst :( Sadly, similar numbers were seen locally on the SAN/NAS

@dietmar is also right. Though currently on a 40Gbe storage connected network, it is my intention to setup a subnet on my other available 40Gbe interface ports for backup comms only.

Maybe try to remove the bandwith limit.

I also have a setup with zfs raid1 over nfs 10gb and i get good results.

105: 2020-05-25 06:45:19 INFO: status: 0% (901906432/274878038016), sparse 0% (765087744), duration 3, read/write 300/45 MB/s
105: 2020-05-25 06:45:25 INFO: status: 1% (3152150528/274878038016), sparse 0% (2228932608), duration 9, read/write 375/131 MB/s
105: 2020-05-25 06:45:33 INFO: status: 2% (5541920768/274878038016), sparse 1% (4324274176), duration 17, read/write 298/36 MB/s
105: 2020-05-25 06:45:38 INFO: status: 3% (8648065024/274878038016), sparse 2% (7430021120), duration 22, read/write 621/0 MB/s
105: 2020-05-25 06:45:42 INFO: status: 4% (11006181376/274878038016), sparse 3% (9786273792), duration 26, read/write 589/0 MB/s
105: 2020-05-25 06:45:47 INFO: status: 5% (14403895296/274878038016), sparse 4% (13183717376), duration 31, read/write 679/0 MB/s
105: 2020-05-25 06:45:50 INFO: status: 6% (16691888128/274878038016), sparse 5% (15471468544), duration 34, read/write 762/0 MB/s
...
 
It would seem I have not properly tuned ZFS. I tried that same command locally on my ZFS server and got similar poor results.
I'm a bit disappointed in that, as it would seem nearly everything I've done has been in vain from purchases to setup... sigh
 
It would seem I have not properly tuned ZFS. I tried that same command locally on my ZFS server and got similar poor results.
I'm a bit disappointed in that, as it would seem nearly everything I've done has been in vain from purchases to setup... sigh

Can you post the output of "zfs get all"
 
Hey ho,

don't be disappointed. We have the same result but our result is worse.

Just a short overview about our bandwith:


111: 2020-05-26 13:07:41 INFO: Starting Backup of VM 111 (qemu)
111: 2020-05-26 13:07:41 INFO: status = stopped
111: 2020-05-26 13:07:41 INFO: update VM 111: -lock backup
111: 2020-05-26 13:07:41 INFO: backup mode: stop
111: 2020-05-26 13:07:41 INFO: ionice priority: 7
111: 2020-05-26 13:07:41 INFO: VM Name: Nextcloud2
111: 2020-05-26 13:07:41 INFO: include disk 'virtio0' 'VMs:111/vm-111-disk-0.qcow2' 32G
111: 2020-05-26 13:07:41 INFO: include disk 'virtio1' 'VMs:111/vm-111-disk-0.raw' 32G
111: 2020-05-26 13:07:42 INFO: snapshots found (not included into backup)
111: 2020-05-26 13:07:42 INFO: creating archive '/mnt/pve/nfs-backup/dump/vzdump-qemu-111-2020_05_26-13_07_41.vma.lzo'
111: 2020-05-26 13:07:42 INFO: starting kvm to execute backup task
111: 2020-05-26 13:07:43 INFO: started backup task '05cb8030-9db8-4e1a-9ef4-d33bea47122d'
111: 2020-05-26 13:07:46 INFO: status: 0% (40763392/68719476736), sparse 0% (31531008), duration 3, read/write 13/3 MB/s
111: 2020-05-26 13:09:37 INFO: status: 1% (688783360/68719476736), sparse 0% (136421376), duration 114, read/write 5/4 MB/s
111: 2020-05-26 13:11:42 INFO: status: 2% (1376976896/68719476736), sparse 0% (167727104), duration 239, read/write 5/5 MB/s
111: 2020-05-26 13:13:53 INFO: status: 3% (2068250624/68719476736), sparse 0% (205672448), duration 370, read/write 5/4 MB/s
111: 2020-05-26 13:15:58 INFO: status: 4% (2749759488/68719476736), sparse 0% (259592192), duration 495, read/write 5/5 MB/s
111: 2020-05-26 13:17:54 INFO: status: 5% (3440050176/68719476736), sparse 0% (264638464), duration 611, read/write 5/5 MB/s
111: 2020-05-26 13:20:09 INFO: status: 6% (4124508160/68719476736), sparse 0% (271364096), duration 746, read/write 5/5 MB/s
111: 2020-05-26 13:21:52 INFO: status: 7% (4814995456/68719476736), sparse 0% (395083776), duration 849, read/write 6/5 MB/s
111: 2020-05-26 13:23:44 INFO: status: 8% (5498929152/68719476736), sparse 0% (409546752), duration 961, read/write 6/5 MB/s
111: 2020-05-26 13:25:51 INFO: status: 9% (6185156608/68719476736), sparse 0% (418648064), duration 1088, read/write 5/5 MB/s
111: 2020-05-26 13:28:10 INFO: status: 10% (6872498176/68719476736), sparse 0% (567967744), duration 1227, read/write 4/3 MB/s
111: 2020-05-26 13:30:12 INFO: status: 11% (7568883712/68719476736), sparse 0% (650645504), duration 1349, read/write 5/5 MB/s
111: 2020-05-26 13:33:27 INFO: status: 12% (8246460416/68719476736), sparse 1% (691208192), duration 1544, read/write 3/3 MB/s
111: 2020-05-26 13:35:35 INFO: status: 13% (8936423424/68719476736), sparse 1% (834506752), duration 1672, read/write 5/4 MB/s
111: 2020-05-26 13:37:36 INFO: status: 14% (9621209088/68719476736), sparse 1% (841027584), duration 1793, read/write 5/5 MB/s
111: 2020-05-26 13:39:35 INFO: status: 15% (10309009408/68719476736), sparse 1% (841027584), duration 1912, read/write 5/5 MB/s
111: 2020-05-26 13:41:14 INFO: status: 16% (11000479744/68719476736), sparse 1% (983502848), duration 2011, read/write 6/5 MB/s
111: 2020-05-26 13:41:41 INFO: status: 18% (12886212608/68719476736), sparse 3% (2728398848), duration 2038, read/write 69/5 MB/s
111: 2020-05-26 13:41:59 INFO: status: 19% (13058441216/68719476736), sparse 4% (2853007360), duration 2056, read/write 9/2 MB/s
111: 2020-05-26 13:44:06 INFO: status: 20% (13752401920/68719476736), sparse 4% (2952962048), duration 2183, read/write 5/4 MB/s
111: 2020-05-26 13:45:55 INFO: status: 21% (14434762752/68719476736), sparse 4% (3044278272), duration 2292, read/write 6/5 MB/s
111: 2020-05-26 13:47:48 INFO: status: 22% (15125905408/68719476736), sparse 4% (3147206656), duration 2405, read/write 6/5 MB/s
111: 2020-05-26 13:49:23 INFO: status: 23% (15812067328/68719476736), sparse 4% (3196895232), duration 2500, read/write 7/6 MB/s
111: 2020-05-26 13:51:07 INFO: status: 24% (16496132096/68719476736), sparse 4% (3205824512), duration 2604, read/write 6/6 MB/s
111: 2020-05-26 13:53:00 INFO: status: 25% (17180721152/68719476736), sparse 4% (3222745088), duration 2717, read/write 6/5 MB/s
111: 2020-05-26 13:54:47 INFO: status: 26% (17876058112/68719476736), sparse 4% (3344326656), duration 2824, read/write 6/5 MB/s
111: 2020-05-26 13:56:41 INFO: status: 27% (18556649472/68719476736), sparse 4% (3344326656), duration 2938, read/write 5/5 MB/s
111: 2020-05-26 13:58:32 INFO: status: 28% (19245891584/68719476736), sparse 4% (3344326656), duration 3049, read/write 6/6 MB/s
111: 2020-05-26 14:00:37 INFO: status: 29% (19934085120/68719476736), sparse 5% (3466743808), duration 3174, read/write 5/4 MB/s
111: 2020-05-26 14:02:30 INFO: status: 30% (20624113664/68719476736), sparse 5% (3489808384), duration 3287, read/write 6/5 MB/s
111: 2020-05-26 14:04:24 INFO: status: 31% (21303459840/68719476736), sparse 5% (3512410112), duration 3401, read/write 5/5 MB/s
111: 2020-05-26 14:06:44 INFO: status: 32% (21992308736/68719476736), sparse 5% (3659636736), duration 3541, read/write 4/3 MB/s
111: 2020-05-26 14:09:21 INFO: status: 33% (22684368896/68719476736), sparse 5% (3659952128), duration 3698, read/write 4/4 MB/s
111: 2020-05-26 14:11:54 INFO: status: 34% (23366926336/68719476736), sparse 5% (3660025856), duration 3851, read/write 4/4 MB/s
111: 2020-05-26 14:12:23 ERROR: interrupted by signal
111: 2020-05-26 14:12:23 INFO: aborting backup job
111: 2020-05-26 14:12:23 INFO: stopping kvm after backup task
111: 2020-05-26 14:12:26 ERROR: Backup of VM 111 failed - interrupted by signal

We have a external Storage via NFS. When we write with dd command we just get this result:
1590585303985.png


Like H4R0 we try this command in the next test.
For another test we made a connection via iSCSI (disadvantage -> just raw file possible, so no snapshot function).
On a NVEMe Storage it performs much better -> but with the missing functions als not the best way for us.


I will keep you updated if the next tests are comming :)
 
Just to throw in a working example (cluster setup 5y ago and still as fast as it was in the beginning):

Code:
NFO: Starting Backup of VM 1003 (qemu)
INFO: status = running
INFO: update VM 1003: -lock backup
INFO: VM Name: ldap
INFO: include disk 'scsi0' 'san-dx100:vm-1003-disk-1' 4G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup-zfs/dump/vzdump-qemu-1003-2020_05_25-18_30_02.vma.lzo'
INFO: started backup task '57b0d1a0-dc76-4951-9307-20ab50a24a6f'
INFO: status: 11% (489095168/4294967296), sparse 4% (200699904), duration 3, read/write 163/96 MB/s
INFO: status: 16% (722206720/4294967296), sparse 8% (362934272), duration 6, read/write 77/23 MB/s
INFO: status: 29% (1265762304/4294967296), sparse 14% (611831808), duration 9, read/write 181/98 MB/s
INFO: status: 40% (1718091776/4294967296), sparse 18% (775782400), duration 12, read/write 150/96 MB/s
INFO: status: 47% (2043805696/4294967296), sparse 19% (844013568), duration 15, read/write 108/85 MB/s
INFO: status: 56% (2444165120/4294967296), sparse 22% (951947264), duration 18, read/write 133/97 MB/s
INFO: status: 57% (2482110464/4294967296), sparse 22% (958488576), duration 21, read/write 12/10 MB/s
INFO: status: 59% (2563964928/4294967296), sparse 22% (984027136), duration 24, read/write 27/18 MB/s
INFO: status: 70% (3009019904/4294967296), sparse 27% (1188007936), duration 27, read/write 148/80 MB/s
INFO: status: 79% (3431333888/4294967296), sparse 31% (1371779072), duration 30, read/write 140/79 MB/s
INFO: status: 90% (3877109760/4294967296), sparse 35% (1539457024), duration 33, read/write 148/92 MB/s
INFO: status: 100% (4294967296/4294967296), sparse 40% (1724391424), duration 36, read/write 139/77 MB/s
INFO: transferred 4294 MB in 36 seconds (119 MB/s)
INFO: archive file size: 1016MB
INFO: Finished Backup of VM 1003 (00:00:49)
 
Just to throw in a working example (cluster setup 5y ago and still as fast as it was in the beginning):

Code:
NFO: Starting Backup of VM 1003 (qemu)
INFO: status = running
INFO: update VM 1003: -lock backup
INFO: VM Name: ldap
INFO: include disk 'scsi0' 'san-dx100:vm-1003-disk-1' 4G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup-zfs/dump/vzdump-qemu-1003-2020_05_25-18_30_02.vma.lzo'
INFO: started backup task '57b0d1a0-dc76-4951-9307-20ab50a24a6f'
INFO: status: 11% (489095168/4294967296), sparse 4% (200699904), duration 3, read/write 163/96 MB/s
INFO: status: 16% (722206720/4294967296), sparse 8% (362934272), duration 6, read/write 77/23 MB/s
INFO: status: 29% (1265762304/4294967296), sparse 14% (611831808), duration 9, read/write 181/98 MB/s
INFO: status: 40% (1718091776/4294967296), sparse 18% (775782400), duration 12, read/write 150/96 MB/s
INFO: status: 47% (2043805696/4294967296), sparse 19% (844013568), duration 15, read/write 108/85 MB/s
INFO: status: 56% (2444165120/4294967296), sparse 22% (951947264), duration 18, read/write 133/97 MB/s
INFO: status: 57% (2482110464/4294967296), sparse 22% (958488576), duration 21, read/write 12/10 MB/s
INFO: status: 59% (2563964928/4294967296), sparse 22% (984027136), duration 24, read/write 27/18 MB/s
INFO: status: 70% (3009019904/4294967296), sparse 27% (1188007936), duration 27, read/write 148/80 MB/s
INFO: status: 79% (3431333888/4294967296), sparse 31% (1371779072), duration 30, read/write 140/79 MB/s
INFO: status: 90% (3877109760/4294967296), sparse 35% (1539457024), duration 33, read/write 148/92 MB/s
INFO: status: 100% (4294967296/4294967296), sparse 40% (1724391424), duration 36, read/write 139/77 MB/s
INFO: transferred 4294 MB in 36 seconds (119 MB/s)
INFO: archive file size: 1016MB
INFO: Finished Backup of VM 1003 (00:00:49)


Hey LnxBil,
did you change anything for the NFS-Connection or did you just connect proxmox to a external Storage via NFS?
What is you Bandwith- 1GBit/s?
 
The OP send me his ZFS config in PM.

He has dedup (on a 70TB storage) and sync write enabled, which should be the reason of poor performance in his case.
 
The OP send me his ZFS config in PM.

He has dedup (on a 70TB storage) and sync write enabled, which should be the reason of poor performance in his case.


can you share this options? For example in Proxmox, where I can find and change this settings?
 
can you share this options? For example in Proxmox, where I can find and change this settings?

I dont know if the proxmox gui offers that.

Use zfs via terminal.

"zfs get all"

Paste the output into pastebin.com and link it here.
 
did you change anything for the NFS-Connection or did you just connect proxmox to a external Storage via NFS?
What is you Bandwith- 1GBit/s?

Just plain simple NFS, added via the GUI and "only" 1GBE. Nothing ordinary. I backup my whole cluster and saturate the incoming 1 GBE port my backup server.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!