Slow backups

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
Wouldn't that test be accelerated by zfs compression? (I run compression in the datasets).

I also assume you meant to add time to the beginning of that command? ;)

time + dd output:

Code:
root@REDACTED1:/mnt/pve/Backups/dump# time dd if=/dev/zero of=/mnt/pve/Backups/dump/testFile.bin bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 9.61966 s, 109 MB/s

real    0m9.623s
user    0m0.002s
sys    0m0.942s



Code:
dd if=/dev/zero of=/mnt/pve/Backups/dump/test.bin bs=1024k count=1000
 

sigxcpu

Active Member
May 4, 2012
433
9
38
Bucharest, Romania
No, I didn't need time because "dd" outputs this information :)

So:
- your NFS raw bandwidth is ~100MB/sec, as expected
- initial phase of the dump does not touch the disks because of the cache, as vmstat shows low activity

The CPU usage is very low, so the feed rate to the compressor is low.

Can you please post the contents of /etc/pve/qemu-server/100.conf ?\

Also, just for the record, try a backup with no compression.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
1) I assume you meant 201.conf, since that was the VM I just tested for backups? ;P

cat /etc/pve/qemue-server/201.conf output :

Code:
root@REDACTED1:/mnt/pve/Backups/dump# cat /etc/pve/qemu-server/201.conf 
bootdisk: virtio0
cores: 4
ide2: ISOs:iso/ubuntu-12.04.2-server-amd64.iso,media=cdrom
memory: 1024
name: REDACTED
net0: virtio=C6:B1:93:96:E2:A1,bridge=vmbr0
ostype: l26
sockets: 1
virtio0: Images:201/vm-201-disk-2.qcow2,format=qcow2,size=32G

2) I've tried several no-compression backups, no change. But, I can do it again. Any particular parameters you want me to do? Any particular stats you want me to collect while doing the no-compression backup?


No, I didn't need time because "dd" outputs this information :)

So:
- your NFS raw bandwidth is ~100MB/sec, as expected
- initial phase of the dump does not touch the disks because of the cache, as vmstat shows low activity

The CPU usage is very low, so the feed rate to the compressor is low.

Can you please post the contents of /etc/pve/qemu-server/100.conf ?\

Also, just for the record, try a backup with no compression.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
So after working with sigxcpu we've proven that it's not an NFS issue. It's vzdump not efficiently reading the VM. NFS reads and writes through other tests demonstrate that the connection operates at near-line speed. Other configurations show that writes are not the bottleneck, nor is the storage.

We ran out of ideas. And I'm hoping sigxcpu will chime in with his part of the findings.

But I am really fucking stumped here.
 

pascal

Member
Dec 21, 2015
15
4
23
41
Hopefully this is not off topic but I experience a problem which might be related with my homelab setup:

Proxmox 4.1 running on a r710 (2x E5640, 96 GB RAM, 6x 146 GB SAS 15k Raid10) and a seperate FreeNAS Box (Pentium G3450, 16GB RAM, 6x3TB WD Red in RaidZ2).
When I backup my VMs to my FreeNAS nfs share using compression I get 'network spikes' as seen below. This resulted in problems when I tried to backup my pfsense vm as the whole network wouldn't respond anymore since pfsense managed all my VLAN routing. I had to connect via iDRAC console and disconnect the nfs share. I have since then configured my L3 switch to do inter-vlan-routing (finally) which helped the network not being accessible during those spikes but they still happen. Overall speed is fine though.
Code:
INFO: starting new backup job: vzdump 202 --node pve --remove 0 --mode snapshot --storage freenas_backup_vm_daily --compress lzo
INFO: Starting Backup of VM 202 (qemu)
INFO: status = running
INFO: update VM 202: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/freenas_backup_vm_daily/dump/vzdump-qemu-202-2015_12_21-23_40_54.vma.lzo'
INFO: started backup task '4b44e030-46ca-4c7e-ad23-b73474e3e804'
INFO: status: 0% (413270016/68719476736), sparse 0% (708608), duration 3, 137/137 MB/s
INFO: status: 1% (805830656/68719476736), sparse 0% (61743104), duration 6, 130/110 MB/s
INFO: status: 2% (1398800384/68719476736), sparse 0% (70279168), duration 17, 53/53 MB/s
INFO: status: 3% (2318663680/68719476736), sparse 0% (308482048), duration 23, 153/113 MB/s
INFO: status: 4% (2809004032/68719476736), sparse 0% (309047296), duration 27, 122/122 MB/s
INFO: status: 5% (3480289280/68719476736), sparse 0% (309645312), duration 42, 44/44 MB/s
INFO: status: 6% (4131782656/68719476736), sparse 0% (310128640), duration 46, 162/162 MB/s
INFO: status: 7% (4812570624/68719476736), sparse 0% (310763520), duration 50, 170/170 MB/s
INFO: status: 8% (5576196096/68719476736), sparse 0% (311545856), duration 54, 190/190 MB/s
INFO: status: 9% (6203375616/68719476736), sparse 0% (313192448), duration 58, 156/156 MB/s
INFO: status: 10% (6898515968/68719476736), sparse 0% (317706240), duration 63, 139/138 MB/s
INFO: status: 11% (7684882432/68719476736), sparse 0% (318513152), duration 99, 21/21 MB/s
INFO: status: 12% (8301576192/68719476736), sparse 0% (321794048), duration 103, 154/153 MB/s
INFO: status: 14% (9841016832/68719476736), sparse 2% (1549201408), duration 106, 513/104 MB/s
INFO: status: 15% (10395975680/68719476736), sparse 2% (1551208448), duration 110, 138/138 MB/s
INFO: status: 16% (11144790016/68719476736), sparse 2% (1551994880), duration 128, 41/41 MB/s
INFO: status: 17% (11792154624/68719476736), sparse 2% (1552568320), duration 132, 161/161 MB/s
INFO: status: 18% (12536578048/68719476736), sparse 2% (1552986112), duration 136, 186/186 MB/s
INFO: status: 19% (13125156864/68719476736), sparse 2% (1553481728), duration 139, 196/196 MB/s
INFO: status: 20% (13862567936/68719476736), sparse 2% (1555570688), duration 144, 147/147 MB/s
INFO: status: 21% (14599979008/68719476736), sparse 2% (1556602880), duration 148, 184/184 MB/s
INFO: status: 22% (15123939328/68719476736), sparse 2% (1557090304), duration 151, 174/174 MB/s
INFO: status: 23% (15885271040/68719476736), sparse 2% (1557684224), duration 191, 19/19 MB/s
INFO: status: 24% (16579035136/68719476736), sparse 2% (1558003712), duration 195, 173/173 MB/s
INFO: status: 25% (17306353664/68719476736), sparse 2% (1558822912), duration 199, 181/181 MB/s
INFO: status: 26% (17914527744/68719476736), sparse 2% (1559314432), duration 203, 152/151 MB/s
INFO: status: 27% (18667143168/68719476736), sparse 2% (1559797760), duration 208, 150/150 MB/s
INFO: status: 28% (19288621056/68719476736), sparse 2% (1560141824), duration 234, 23/23 MB/s
INFO: status: 29% (19960758272/68719476736), sparse 2% (1560813568), duration 238, 168/167 MB/s
INFO: status: 30% (20638400512/68719476736), sparse 2% (1561931776), duration 242, 169/169 MB/s
INFO: status: 31% (21398552576/68719476736), sparse 2% (1562324992), duration 268, 29/29 MB/s
INFO: status: 32% (22008299520/68719476736), sparse 2% (1562468352), duration 273, 121/121 MB/s
INFO: status: 33% (22828417024/68719476736), sparse 2% (1563729920), duration 283, 82/81 MB/s
INFO: status: 34% (23518511104/68719476736), sparse 2% (1564409856), duration 294, 62/62 MB/s
INFO: status: 35% (24095817728/68719476736), sparse 2% (1564794880), duration 297, 192/192 MB/s
INFO: status: 36% (24813502464/68719476736), sparse 2% (1566007296), duration 301, 179/179 MB/s
INFO: status: 37% (25535447040/68719476736), sparse 2% (1566846976), duration 305, 180/180 MB/s
INFO: status: 38% (26126909440/68719476736), sparse 2% (1567068160), duration 308, 197/197 MB/s
INFO: status: 39% (26850885632/68719476736), sparse 2% (1567375360), duration 312, 180/180 MB/s
INFO: status: 40% (27538882560/68719476736), sparse 2% (1567719424), duration 317, 137/137 MB/s
INFO: status: 41% (28238872576/68719476736), sparse 2% (1568653312), duration 362, 15/15 MB/s
INFO: status: 42% (28912123904/68719476736), sparse 2% (1568964608), duration 366, 168/168 MB/s
INFO: status: 43% (29602480128/68719476736), sparse 2% (1569353728), duration 370, 172/172 MB/s
INFO: status: 44% (30275665920/68719476736), sparse 2% (1570668544), duration 375, 134/134 MB/s
INFO: status: 45% (31133794304/68719476736), sparse 2% (1572368384), duration 396, 40/40 MB/s
INFO: status: 46% (31735349248/68719476736), sparse 2% (1572614144), duration 399, 200/200 MB/s
INFO: status: 47% (32474660864/68719476736), sparse 2% (1573003264), duration 403, 184/184 MB/s
INFO: status: 50% (34407514112/68719476736), sparse 4% (3132018688), duration 406, 644/124 MB/s
INFO: status: 52% (35946889216/68719476736), sparse 6% (4246626304), duration 409, 513/141 MB/s
INFO: status: 57% (39706165248/68719476736), sparse 11% (7637299200), duration 412, 1253/122 MB/s
INFO: status: 59% (40792948736/68719476736), sparse 11% (8217366528), duration 415, 362/168 MB/s
INFO: status: 60% (41876586496/68719476736), sparse 13% (9185972224), duration 418, 361/38 MB/s
INFO: status: 61% (41946054656/68719476736), sparse 13% (9186091008), duration 443, 2/2 MB/s
INFO: status: 62% (43286790144/68719476736), sparse 14% (9762091008), duration 448, 268/152 MB/s
INFO: status: 63% (43840634880/68719476736), sparse 14% (9762942976), duration 451, 184/184 MB/s
INFO: status: 64% (44494159872/68719476736), sparse 14% (9763385344), duration 454, 217/217 MB/s
INFO: status: 65% (44978274304/68719476736), sparse 14% (9763774464), duration 457, 161/161 MB/s
INFO: status: 66% (45620068352/68719476736), sparse 14% (9764089856), duration 460, 213/213 MB/s
INFO: status: 67% (46091993088/68719476736), sparse 14% (9764462592), duration 490, 15/15 MB/s
INFO: status: 68% (46791393280/68719476736), sparse 14% (9764929536), duration 499, 77/77 MB/s
INFO: status: 69% (47423750144/68719476736), sparse 14% (9765257216), duration 503, 158/158 MB/s
INFO: status: 71% (49056382976/68719476736), sparse 15% (10704654336), duration 507, 408/173 MB/s
INFO: status: 72% (49748901888/68719476736), sparse 15% (10853896192), duration 510, 230/181 MB/s
INFO: status: 74% (50960007168/68719476736), sparse 16% (11633573888), duration 513, 403/143 MB/s
INFO: status: 75% (51545243648/68719476736), sparse 16% (11633729536), duration 516, 195/195 MB/s
INFO: status: 77% (53078392832/68719476736), sparse 18% (12755042304), duration 519, 511/137 MB/s
INFO: status: 78% (53663760384/68719476736), sparse 18% (12755099648), duration 522, 195/195 MB/s
INFO: status: 79% (54784950272/68719476736), sparse 19% (13435133952), duration 525, 373/147 MB/s
INFO: status: 80% (55140417536/68719476736), sparse 19% (13435232256), duration 528, 118/118 MB/s
INFO: status: 82% (56886951936/68719476736), sparse 21% (14837649408), duration 531, 582/114 MB/s
INFO: status: 85% (58468859904/68719476736), sparse 23% (16028667904), duration 534, 527/130 MB/s
INFO: status: 87% (60376481792/68719476736), sparse 25% (17301786624), duration 577, 44/14 MB/s
INFO: status: 90% (62174396416/68719476736), sparse 27% (18673573888), duration 580, 599/142 MB/s
INFO: status: 93% (64097746944/68719476736), sparse 29% (20170207232), duration 583, 641/142 MB/s
INFO: status: 95% (65580171264/68719476736), sparse 31% (21452337152), duration 591, 185/25 MB/s
INFO: status: 97% (66826076160/68719476736), sparse 32% (22302216192), duration 594, 415/132 MB/s
INFO: status: 99% (68244733952/68719476736), sparse 33% (23294017536), duration 598, 354/106 MB/s
INFO: status: 100% (68719476736/68719476736), sparse 34% (23727415296), duration 599, 474/41 MB/s
INFO: transferred 68719 MB in 599 seconds (114 MB/s)
INFO: archive file size: 37.29GB
INFO: Finished Backup of VM 202 (00:10:08)
INFO: Backup job finished successfully
TASK OK
I bolded some lines where there are large jumps in duration between two updates. edit: well, the bolding didn't work but you can easily tell from the status lines with slow read/write speeds
 
Last edited:

pascal

Member
Dec 21, 2015
15
4
23
41
When I turn off compression everything works fine.
Code:
INFO: starting new backup job: vzdump 254 --node pve --remove 0 --storage freenas_backup_vm_daily --mode stop
INFO: Starting Backup of VM 254 (qemu)
INFO: status = stopped
INFO: update VM 254: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/freenas_backup_vm_daily/dump/vzdump-qemu-254-2015_12_21-23_10_10.vma'
INFO: starting kvm to execute backup task
Running as unit 254.scope.
INFO: started backup task '59cff751-b743-40a1-9540-42cb4f5dfc90'
INFO: status: 0% (98238464/17179869184), sparse 0% (16384), duration 3, 32/32 MB/s
INFO: status: 1% (239927296/17179869184), sparse 0% (303104), duration 6, 47/47 MB/s
INFO: status: 2% (361562112/17179869184), sparse 0% (2142208), duration 9, 40/39 MB/s
INFO: status: 3% (534446080/17179869184), sparse 0% (5419008), duration 13, 43/42 MB/s
INFO: status: 4% (730267648/17179869184), sparse 0% (37949440), duration 17, 48/40 MB/s
INFO: status: 5% (897581056/17179869184), sparse 0% (40108032), duration 21, 41/41 MB/s
INFO: status: 6% (1072431104/17179869184), sparse 0% (45424640), duration 26, 34/33 MB/s
INFO: status: 7% (1205403648/17179869184), sparse 0% (45592576), duration 29, 44/44 MB/s
INFO: status: 8% (1384120320/17179869184), sparse 0% (46051328), duration 33, 44/44 MB/s
INFO: status: 9% (1562705920/17179869184), sparse 0% (50835456), duration 37, 44/43 MB/s
INFO: status: 10% (1726218240/17179869184), sparse 0% (51068928), duration 41, 40/40 MB/s
INFO: status: 11% (1897201664/17179869184), sparse 0% (51601408), duration 45, 42/42 MB/s
INFO: status: 12% (2117926912/17179869184), sparse 0% (154529792), duration 49, 55/29 MB/s
INFO: status: 13% (2352611328/17179869184), sparse 1% (278261760), duration 52, 78/36 MB/s
INFO: status: 14% (2478964736/17179869184), sparse 1% (278364160), duration 55, 42/42 MB/s
INFO: status: 15% (2611740672/17179869184), sparse 1% (278458368), duration 58, 44/44 MB/s
INFO: status: 18% (3257532416/17179869184), sparse 5% (864165888), duration 61, 215/20 MB/s
INFO: status: 23% (4003004416/17179869184), sparse 9% (1568133120), duration 64, 248/13 MB/s
INFO: status: 24% (4132175872/17179869184), sparse 9% (1568612352), duration 67, 43/42 MB/s
INFO: status: 25% (4303224832/17179869184), sparse 9% (1568751616), duration 71, 42/42 MB/s
INFO: status: 26% (4485808128/17179869184), sparse 9% (1568825344), duration 76, 36/36 MB/s
INFO: status: 27% (4668129280/17179869184), sparse 9% (1568833536), duration 80, 45/45 MB/s
INFO: status: 28% (4850581504/17179869184), sparse 9% (1568866304), duration 84, 45/45 MB/s
INFO: status: 29% (4991221760/17179869184), sparse 9% (1568874496), duration 87, 46/46 MB/s
INFO: status: 30% (5169938432/17179869184), sparse 9% (1568915456), duration 91, 44/44 MB/s
INFO: status: 31% (5356191744/17179869184), sparse 9% (1568919552), duration 95, 46/46 MB/s
INFO: status: 32% (5534711808/17179869184), sparse 9% (1569648640), duration 100, 35/35 MB/s
INFO: status: 33% (5705826304/17179869184), sparse 9% (1572769792), duration 104, 42/41 MB/s
INFO: status: 34% (5888344064/17179869184), sparse 9% (1573048320), duration 108, 45/45 MB/s
INFO: status: 35% (6025969664/17179869184), sparse 9% (1573806080), duration 111, 45/45 MB/s
INFO: status: 36% (6203703296/17179869184), sparse 9% (1575325696), duration 115, 44/44 MB/s
INFO: status: 37% (6382354432/17179869184), sparse 9% (1578405888), duration 120, 35/35 MB/s
INFO: status: 38% (6564872192/17179869184), sparse 9% (1579171840), duration 124, 45/45 MB/s
INFO: status: 39% (6720651264/17179869184), sparse 9% (1586622464), duration 127, 51/49 MB/s
INFO: status: 42% (7242055680/17179869184), sparse 11% (2025660416), duration 130, 173/27 MB/s
INFO: status: 48% (8283619328/17179869184), sparse 17% (3067224064), duration 133, 347/0 MB/s
INFO: status: 51% (8788508672/17179869184), sparse 20% (3500916736), duration 136, 168/23 MB/s
INFO: status: 54% (9365880832/17179869184), sparse 23% (4001386496), duration 139, 192/25 MB/s
INFO: status: 55% (9510846464/17179869184), sparse 23% (4002725888), duration 143, 36/35 MB/s
INFO: status: 56% (9659088896/17179869184), sparse 23% (4003201024), duration 146, 49/49 MB/s
INFO: status: 57% (9811066880/17179869184), sparse 23% (4004311040), duration 149, 50/50 MB/s
INFO: status: 58% (10012524544/17179869184), sparse 23% (4004978688), duration 153, 50/50 MB/s
INFO: status: 61% (10612178944/17179869184), sparse 26% (4535492608), duration 156, 199/23 MB/s
INFO: status: 64% (11027349504/17179869184), sparse 28% (4844523520), duration 159, 138/35 MB/s
INFO: status: 65% (11266818048/17179869184), sparse 28% (4944072704), duration 162, 79/46 MB/s
INFO: status: 67% (11611275264/17179869184), sparse 30% (5177548800), duration 165, 114/36 MB/s
INFO: status: 68% (11718885376/17179869184), sparse 30% (5177556992), duration 168, 35/35 MB/s
INFO: status: 69% (11867389952/17179869184), sparse 30% (5181681664), duration 171, 49/48 MB/s
INFO: status: 70% (12061376512/17179869184), sparse 30% (5181743104), duration 175, 48/48 MB/s
INFO: status: 71% (12208832512/17179869184), sparse 30% (5181755392), duration 178, 49/49 MB/s
INFO: status: 75% (13032816640/17179869184), sparse 34% (5973008384), duration 181, 274/10 MB/s
INFO: status: 77% (13330939904/17179869184), sparse 35% (6145458176), duration 184, 99/41 MB/s
INFO: status: 78% (13491437568/17179869184), sparse 35% (6147194880), duration 187, 53/52 MB/s
INFO: status: 79% (13634961408/17179869184), sparse 35% (6147477504), duration 190, 47/47 MB/s
INFO: status: 80% (13777305600/17179869184), sparse 35% (6148571136), duration 193, 47/47 MB/s
INFO: status: 81% (14087421952/17179869184), sparse 37% (6395338752), duration 196, 103/21 MB/s
INFO: status: 87% (15105523712/17179869184), sparse 43% (7405924352), duration 199, 339/2 MB/s
INFO: status: 89% (15325593600/17179869184), sparse 43% (7468085248), duration 202, 73/52 MB/s
INFO: status: 90% (15534653440/17179869184), sparse 44% (7562702848), duration 205, 69/38 MB/s
INFO: status: 92% (15969353728/17179869184), sparse 46% (7954952192), duration 208, 144/14 MB/s
INFO: status: 99% (17018388480/17179869184), sparse 52% (9003966464), duration 211, 349/0 MB/s
INFO: status: 100% (17179869184/17179869184), sparse 53% (9165434880), duration 212, 161/0 MB/s
INFO: transferred 17179 MB in 212 seconds (81 MB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 7.47GB
INFO: Finished Backup of VM 254 (00:03:36)
INFO: Backup job finished successfully
TASK OK
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
Pretty sure those aren't network spikes, the bit increase is because of compression. The spikes, I'm speculating here, are effective bandwidth, not actual bandwidth, used. If you compress something then send it over the line, you effectively are sending more data, and I'm pretty sure that's what that's about.
 

e100

Renowned Member
Nov 6, 2010
1,266
38
68
Columbus, Ohio
ulbuilder.wordpress.com
Is your VM doing IO during the backup?

Did you try backing up to local storage without the tmp folder option? It would be really helpful to know the results of that test.

Is it possible to log into the nas and monitor its IO during a backup? Like running "vmstat 1" in the nas itself. I'd be curious to see what it reports.

The vm.conf does not show any IO limits but so you know, the IO limit settings limit the backup speed too.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
1) The backup tasks back up multiple VMs (not all at once), and almost all of them are effectively idling when it happens. So no IO is observed or expected during the backups (except for the backups).

2) I tried backing up to local storage, it made no change, and with or without tmp it should not have been the bottleneck as the HDDs are not crummy old ones, they're recent tech. (the ones in the nodes)

3) I've monitored the IO on the NAS during backups and it's not even close to pushing the storage, be it from IOPS or throughput perspectives.


Is your VM doing IO during the backup?

Did you try backing up to local storage without the tmp folder option? It would be really helpful to know the results of that test.

Is it possible to log into the nas and monitor its IO during a backup? Like running "vmstat 1" in the nas itself. I'd be curious to see what it reports.

The vm.conf does not show any IO limits but so you know, the IO limit settings limit the backup speed too.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
Yeah the performance within the VMs are as I would expect. They start rapidly, perform disk operations quickly too, etc. This is really the only area where I'm getting lackluster performance :(

Even VM migrations are rather fast (had to turn off ssh for migrations though to get proper speeds).


Have you run any IO benchmarks inside the VMs?
Are they able to read/write at the speeds you expect?
 

e100

Renowned Member
Nov 6, 2010
1,266
38
68
Columbus, Ohio
ulbuilder.wordpress.com
You established that you can write to the nas at a reasonable speed and writing locally did not seem to make a difference. I'd conclude that its not an IO writing issue.

The proxmox backup reads the vm disk through the kvm process, if the guest OS can read quickly then vzdump should read quickly.

This problem has me scratching my head, I predict that the solution will end up being something stupid simple and we will all feel like idiots for not figuring it out quicker.

Have you tried storing the vm disk locally and then backup locally? If that performs well then we know the problem is nfs/network related, if you have same problem then we know nfs/network is not the problem.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
I know it's not NFS or network related though because of pvetest, as well as copying files over the NFS tunnel, and doing migrations and other network tasks. Those perform properly. Backups, however, do not. This isn't even a recent thing, I'm pretty sure it's been like this the whole time.


You established that you can write to the nas at a reasonable speed and writing locally did not seem to make a difference. I'd conclude that its not an IO writing issue.

The proxmox backup reads the vm disk through the kvm process, if the guest OS can read quickly then vzdump should read quickly.

This problem has me scratching my head, I predict that the solution will end up being something stupid simple and we will all feel like idiots for not figuring it out quicker.

Have you tried storing the vm disk locally and then backup locally? If that performs well then we know the problem is nfs/network related, if you have same problem then we know nfs/network is not the problem.
 

e100

Renowned Member
Nov 6, 2010
1,266
38
68
Columbus, Ohio
ulbuilder.wordpress.com
I know it's not NFS or network related though because of pvetest, as well as copying files over the NFS tunnel, and doing migrations and other network tasks. Those perform properly. Backups, however, do not.

Those other tests do not perform the exact same IO patterns that the backup process performs so while helpful they are not conclusive. The particular IO patterns created by the backup process might perform better by changing some NFS mount option. Or maybe the backup process would perform better if you disabled CPU power saving features in the BIOS. Without some conclusive test that demonstrates where the problem is or is not I cannot help guide you to a solution.

So the question remains, does the backup process perform the same or better when the network and NFS are not involved? ( backup from a local disk to a local disk )

The results of that test will allow us to focus on what might be responsible and eliminate what is not responsible for this problem.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
The testing I just mentioned is relevant because it demonstrates those elements are not inherently bottlenecked, and as such are not the root cause of the poor performance. I understand the backup behaves differently, but the observed speed isn't even close to the expected speed.

I'll try to replicate your proposed scenario shortly.

As for power-saving, none of the nodes have power saving elements turned on in the BIOS, nor the storage system. I know this causes problems so it was something I set before even installing Proxmox ;)


Those other tests do not perform the exact same IO patterns that the backup process performs so while helpful they are not conclusive. The particular IO patterns created by the backup process might perform better by changing some NFS mount option. Or maybe the backup process would perform better if you disabled CPU power saving features in the BIOS. Without some conclusive test that demonstrates where the problem is or is not I cannot help guide you to a solution.

So the question remains, does the backup process perform the same or better when the network and NFS are not involved? ( backup from a local disk to a local disk )

The results of that test will allow us to focus on what might be responsible and eliminate what is not responsible for this problem.
 

BloodyIron

Active Member
Jan 14, 2013
211
6
38
it.lanified.com
Okay so I took a small VM, 20GB image (with plenty of space unused on it), moved it local onto one node, setup a local backup space too, then backed it up to the local space too. Speeds are far more in-line with what I would want to see:

Code:
INFO: starting new backup job: vzdump 300 --remove 0 --mode snapshot --compress lzo --storage backupSpace --node REDACTED1
INFO: Starting Backup of VM 300 (qemu)
INFO: status = running
INFO: update VM 300: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/localTest/backupSpace/dump/vzdump-qemu-300-2016_01_07-12_20_02.vma.lzo'
INFO: started backup task '046f4b3f-d03c-48d4-b74e-598d1ac88fb5'
INFO: status: 1% (260767744/21474836480), sparse 0% (33009664), duration 3, 86/75 MB/s
INFO: status: 2% (565968896/21474836480), sparse 0% (37519360), duration 6, 101/100 MB/s
INFO: status: 3% (745013248/21474836480), sparse 0% (43683840), duration 9, 59/57 MB/s
INFO: status: 4% (1003945984/21474836480), sparse 0% (48054272), duration 12, 86/84 MB/s
INFO: status: 6% (1324023808/21474836480), sparse 0% (52379648), duration 15, 106/105 MB/s
INFO: status: 7% (1543241728/21474836480), sparse 0% (52383744), duration 18, 73/73 MB/s
INFO: status: 8% (1832124416/21474836480), sparse 0% (52633600), duration 21, 96/96 MB/s
INFO: status: 9% (1999372288/21474836480), sparse 0% (52752384), duration 24, 55/55 MB/s
INFO: status: 10% (2216558592/21474836480), sparse 0% (85221376), duration 27, 72/61 MB/s
INFO: status: 11% (2523922432/21474836480), sparse 0% (87252992), duration 30, 102/101 MB/s
INFO: status: 12% (2581331968/21474836480), sparse 0% (87339008), duration 33, 19/19 MB/s
INFO: status: 13% (2887057408/21474836480), sparse 0% (87638016), duration 36, 101/101 MB/s
INFO: status: 14% (3052273664/21474836480), sparse 0% (87744512), duration 39, 55/55 MB/s
INFO: status: 15% (3261661184/21474836480), sparse 0% (87941120), duration 42, 69/69 MB/s
INFO: status: 16% (3523280896/21474836480), sparse 0% (92348416), duration 45, 87/85 MB/s
INFO: status: 17% (3688497152/21474836480), sparse 0% (96661504), duration 48, 55/53 MB/s
INFO: status: 18% (3991142400/21474836480), sparse 0% (96931840), duration 51, 100/100 MB/s
INFO: status: 19% (4175036416/21474836480), sparse 0% (96993280), duration 54, 61/61 MB/s
INFO: status: 20% (4413390848/21474836480), sparse 0% (130736128), duration 57, 79/68 MB/s
INFO: status: 21% (4618321920/21474836480), sparse 0% (136884224), duration 60, 68/66 MB/s
INFO: status: 22% (4801495040/21474836480), sparse 0% (137060352), duration 63, 61/60 MB/s
INFO: status: 23% (5091885056/21474836480), sparse 0% (137355264), duration 66, 96/96 MB/s
INFO: status: 24% (5236129792/21474836480), sparse 0% (137486336), duration 70, 36/36 MB/s
INFO: status: 25% (5458362368/21474836480), sparse 0% (137728000), duration 73, 74/73 MB/s
INFO: status: 26% (5751046144/21474836480), sparse 0% (137957376), duration 76, 97/97 MB/s
INFO: status: 27% (5899288576/21474836480), sparse 0% (138133504), duration 79, 49/49 MB/s
INFO: status: 28% (6089932800/21474836480), sparse 0% (138289152), duration 82, 63/63 MB/s
INFO: status: 29% (6408634368/21474836480), sparse 0% (138289152), duration 85, 106/106 MB/s
INFO: status: 30% (6626279424/21474836480), sparse 1% (277172224), duration 88, 72/26 MB/s
INFO: status: 32% (6898974720/21474836480), sparse 1% (277327872), duration 91, 90/90 MB/s
INFO: status: 33% (7164002304/21474836480), sparse 1% (277606400), duration 95, 66/66 MB/s
INFO: status: 34% (7340556288/21474836480), sparse 1% (277790720), duration 98, 58/58 MB/s
INFO: status: 35% (7628783616/21474836480), sparse 1% (278056960), duration 101, 96/95 MB/s
INFO: status: 36% (7738753024/21474836480), sparse 1% (278183936), duration 104, 36/36 MB/s
INFO: status: 37% (7997489152/21474836480), sparse 1% (278401024), duration 108, 64/64 MB/s
INFO: status: 38% (8168734720/21474836480), sparse 1% (278581248), duration 111, 57/57 MB/s
INFO: status: 39% (8446017536/21474836480), sparse 1% (278888448), duration 114, 92/92 MB/s
INFO: status: 40% (8742895616/21474836480), sparse 1% (413384704), duration 117, 98/54 MB/s
INFO: status: 41% (8991604736/21474836480), sparse 1% (413507584), duration 120, 82/82 MB/s
INFO: status: 42% (9144500224/21474836480), sparse 1% (413720576), duration 123, 50/50 MB/s
INFO: status: 44% (9468575744/21474836480), sparse 1% (413954048), duration 126, 108/107 MB/s
INFO: status: 45% (9673113600/21474836480), sparse 1% (414167040), duration 130, 51/51 MB/s
INFO: status: 46% (9920839680/21474836480), sparse 1% (414375936), duration 134, 61/61 MB/s
INFO: status: 47% (10126491648/21474836480), sparse 1% (414392320), duration 137, 68/68 MB/s
INFO: status: 48% (10342760448/21474836480), sparse 1% (414556160), duration 142, 43/43 MB/s
INFO: status: 49% (10556342272/21474836480), sparse 1% (414584832), duration 145, 71/71 MB/s
INFO: status: 50% (10738991104/21474836480), sparse 1% (415121408), duration 149, 45/45 MB/s
INFO: status: 51% (11140530176/21474836480), sparse 2% (553906176), duration 152, 133/87 MB/s
INFO: status: 52% (11312627712/21474836480), sparse 2% (554008576), duration 155, 57/57 MB/s
INFO: status: 53% (11574312960/21474836480), sparse 2% (554131456), duration 158, 87/87 MB/s
INFO: status: 54% (11756765184/21474836480), sparse 2% (554303488), duration 161, 60/60 MB/s
INFO: status: 55% (12019236864/21474836480), sparse 2% (554553344), duration 164, 87/87 MB/s
INFO: status: 56% (12224299008/21474836480), sparse 2% (554696704), duration 167, 68/68 MB/s
INFO: status: 57% (12377849856/21474836480), sparse 2% (554864640), duration 170, 51/51 MB/s
INFO: status: 59% (12676628480/21474836480), sparse 2% (555053056), duration 173, 99/99 MB/s
INFO: status: 60% (13067223040/21474836480), sparse 3% (689479680), duration 178, 78/51 MB/s
INFO: status: 61% (13265797120/21474836480), sparse 3% (689614848), duration 181, 66/66 MB/s
INFO: status: 62% (13497925632/21474836480), sparse 3% (697733120), duration 184, 77/74 MB/s
INFO: status: 63% (13716488192/21474836480), sparse 3% (718184448), duration 187, 72/66 MB/s
INFO: status: 68% (14647492608/21474836480), sparse 7% (1523380224), duration 190, 310/41 MB/s
INFO: status: 87% (18683330560/21474836480), sparse 25% (5559193600), duration 193, 1345/0 MB/s
INFO: status: 100% (21474836480/21474836480), sparse 38% (8350687232), duration 195, 1395/0 MB/s
INFO: transferred 21474 MB in 195 seconds (110 MB/s)
INFO: archive file size: 7.67GB
INFO: Finished Backup of VM 300 (00:03:16)
INFO: Backup job finished successfully
TASK OK
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!