Appalling Backup Performance

blackpaw

Renowned Member
Nov 1, 2013
295
20
83
Its taking 2-3 hours per VM to backup on my setup, where each vm has roughtly 15-20 GB of data (producing on average a 9-12 GB backup file).

I'm guessing its because the disks are over provisioned - 300GB and 512GB, but they are sparse qcow2 files, only occupying the actual data size.

Whats worse, while the backup is going on, the network is being hammered at 60M, pressumably transferring lots of Zeros.

I though backup handled sparse files intelligently? this sort of behaviour renders it useless. If I migrate the rest of our VM's backup time will blow out to over 24 hours.



The Disks are so large because I used windows bare metal backup to transfer them them, I can probably resize most of them, but thats tricky under windows and we still need a large safety factor for growth.

Can qcow2 files be converted to thinly provisioned?


NB: GB Enternet, Dual nics on each node and the NAS. Not bonded. VM's are stored on a NFS share on the NAS.
 
Last edited:
The backup finished just as I was typing :) here's the stats:

Code:
VMID    NAME    STATUS    TIME    SIZE    FILENAME
100    JPF3    OK    02:42:31    12.37GB    /mnt/backup/dump/vzdump-qemu-100-2013_12_02-00_15_01.vma.lzo
101    TBK    OK    02:43:22    9.66GB    /mnt/backup/dump/vzdump-qemu-101-2013_12_02-02_57_33.vma.lzo
102    Lenin2    OK    00:40:33    7.41GB    /mnt/backup/dump/vzdump-qemu-102-2013_12_02-05_40_55.vma.lzo
TOTAL    06:06:27    29.44GB
 
Why then is a very similar sized VM (27G vs 15GB using "du -h") taking 180 minutes to backup compared to 40 min? and the link is at 60Mps the whole time? (I'm presuming 60M on the network graph is 60Mps?)

Thanks,

Lindsay
 
Last edited:
I converted the Lenin2 vm disk from sparse to thin provisioned with the following

Code:
qemu-img convert -f qcow2 -O qcow2 vm-102-disk-1.qcow2 thin.qcow2 -p
mv vm-102-disk-1.qcow2 vm-102-disk-1.qcow2.fat
mv thin.qcow2 vm-102-disk-1.qcow2
The result was 16G instead of 128GB. The VM ran fine.

I then did a backup to a a USB 2.0 disk, it only took *7* minutes - this is a lot better than the 40 minutes the previous backup took.

I'll test with the much bigger VM (100 -JPF3) tonight, when I can take it offline to convert.
 
Another huge difference - snapshot times.

With the sparse 128GB Hard Disk:
Disk Only: 24 seconds
Disk + Ram: 62 Seconds (long enough for network connections to timeout)

With the thin Provisioned Hard Disk:
Disk Only: 4 seconds
Disk + RAM: 27 seconds


The 300GB Sparse disk takes 122 seconds for a disk only snapshot - 5 times as long as the 128GB one.

For sparse qcow2 files, virtual disk size seems to have a serious impact on backup and snapshot times, regardless of the actual data stored in them.
 
Ok, I converted the three VM's above (10,101,102) to thin provisioned and got dramatically better results:


Code:
[FONT=courier new]VMID   NAME        STATUS  TIME        SIZE      FILENAME
100    JPF3        OK      00:18:54    12.37GB   /mnt/backup/dump/vzdump-qemu-100-2013_12_03-00_15_01.vma.lzo
101    TBK         OK      00:16:10    9.67GB    /mnt/backup/dump/vzdump-qemu-101-2013_12_03-00_33_55.vma.lzo
102    Lenin2      OK      00:07:59    7.45GB    /mnt/backup/dump/vzdump-qemu-102-2013_12_03-00_50_05.vma.lzo
103    px-lindsay  OK      00:23:18    8.29GB    /mnt/backup/dump/vzdump-qemu-103-2013_12_03-00_58_05.vma.lzo
104    Win7Base    OK      00:13:19    8.28GB    /mnt/backup/dump/vzdump-qemu-104-2013_12_03-01_21_23.vma.lzo
[/FONT]

Backup times have gone down from 180 minutes to 18 minutes.

Note the last two VM's (103 & 104). They have the same amount of data as 102 but are sparse provisioned.They take 2-3 times as long to backup.

I've done test restores on the VM's, they restore and work fine.

I have a suspicion that my NAS (a QNAP TS-420) is part of the problem. I'll do some more comparisons with empty disks, backing up from the local drive and from the NAS.


And so as not to be a whiner :) - I am very happy with ProxMox, no regrets on migrating my production servers from XenServer, will definitely do the Test and VDI VM's when I get a chance. Stability has been excellent and managing them has been much easier. The ability to do these conversions and backup testing has been a relief. Nice open system with full linux tools.

Thanks very much!

Cheers,

Lindsay.
 
I tested backups and snapshots with the same 512GB VM, sparse provisioned, where the VM SR was NFS and Local. The NFS VM took 2.5 hours, the local vm took 20 minutes. Thin provisioned disks took roughly the same time from both (20min).

Snapshots were similary effected.

There is definitely a heavy penalty for using sparse files on NFS over thin files.
 
Hmm, this is interesting. Up till now I've been using LVM volumes for KVM VMs, but as technology advances with KVM, and new important features are added (snapshots, live backups, etc.), and performance is getting better, I started to test qcow2 images. I was first surprised at the size of thinly provisioned images, but realised they just look big, since they're sparse files. I thought OK, this is the default for qemu-img, although absolutely unnecessary. But now it seems to create problems with backups. Plus, it makes rsyncing snapshotted images almost pointless (planned backup method, using ZFS snapshots on receiving end), since for an efficient transfer both --sparse and --inplace rsync options must be provided, but they're mutually exclusive.

+1 for making "real" thin provisioning without creating sparse files an option on the WebUI.
 
While I still maintain that it'd be good to have preallocation as an option, here's an interesting read on the subject: [link]. There are more tests on the net, the conclusion seems to be that preallocating metadata (and thus creating a sparse file) helps performance quite a bit. Well, I think it's only at the beginning of the lifespan of a VM, since it probably helps at allocating new chunks of real disk space. Later when the image is filled up with data, I can't imagine it really helps performance. Unless, of course, the usage pattern of the particular VM makes the image grow quickly over time.
 
Yah, my servers are mostly static, write performance is not a issue. Efficeint backup is.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!