Slow restore from backups - two lzo processes thrashing restore drive?

Trimmings · Jan 10, 2013

So, I'm running proxmox through its paces on a testbed with some test servers and I'm recovering backups from one server to another.

I'm restoring from a 1TB disk (sata-connected) to a 1TB raid-1 array (raid controller). I expect to get 80-120MB/s out of both storage devices. What I'm seeing is the backup drive getting thrashed when doing a restore, and the restore is only storing about half the speed that it's hitting up the backup drive. Here's some info dumps;

5718 ? S 0:10 tar tf /mnt/tmp/dump/vzdump-qemu-102-2013_01_09-12_47_11.tar.lzo
5719 ? D 1:04 lzop -d
5720 ? S 0:00 sh -c zcat -f|tar xf /mnt/tmp/dump/vzdump-qemu-102-2013_01_09-12_47_11.tar.lzo '--to-command=/usr/lib/qemu-server/qmextract --storage lo
5722 ? S 0:17 tar xf /mnt/tmp/dump/vzdump-qemu-102-2013_01_09-12_47_11.tar.lzo --to-command=/usr/lib/qemu-server/qmextract --storage local
5723 ? D 0:38 lzop -d
5741 ? S 0:00 /bin/sh -c /usr/lib/qemu-server/qmextract --storage local
5744 ? S 0:18 dd ibs=256K obs=256K of=/var/lib/vz/images/102/vm-102-disk-2.vmdk

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 4637.40 0.20 214.20 0.80 19406.40 181.04 2.16 10.07 0.23 4.96
sdb 2.00 0.00 266.80 0.00 32836.00 0.00 246.15 3.86 14.47 3.74 99.86

Nothing fancy going on here - a simple web based backup on one server using a sata connected disk to a restore on the other one. It looks to me like the backup disk (sdb) it getting thrashed by two processes and only one is actually doing anything.

Also - the backup names only show the VM ID from the other server - the backups really need to show the server's name/description also - is there any way to do this?

dietmar · Jan 10, 2013

Trimmings said:
Nothing fancy going on here - a simple web based backup on one server using a sata connected disk to a restore on the other one. It looks to me like the backup disk (sdb) it getting thrashed by two processes and only one is actually doing anything.

We also observed that behavior sometimes with recent kernels. A single 'dd' to local disk can grow server load up to 10 (seems it is solved with latest test kernel, but we are still testing).

Trimmings · Jan 10, 2013

dietmar said:
We also observed that behavior sometimes with recent kernels. A single 'dd' to local disk can grow server load up to 10 (seems it is solved with latest test kernel, but we are still testing).

I'm using the proxmox 2.2 release DVD, no updates - so this is a bug in this release somehow? Any other fixes other than using the 'test' kernel? Perhaps use something other than lzo? Also, can anything be done about the file names for backup? Including the server description (at least part) in the backup filename is very important I'd say for managing backups and restores etc in the management console.

tom · Jan 10, 2013

no, the kernel in the 2.2 release does not show the mentioned behavior with dd.. performance problems can have a lot of reasons.

2.3 release will introduce a complete new backup/restore, so I suggest you test the new implementation - will be available soon in pvetest.

Trimmings · Jan 10, 2013

tom said:
no, the kernel in the 2.2 release does not show the mentioned behavior with dd.. performance problems can have a lot of reasons.

2.3 release will introduce a complete new backup/restore, so I suggest you test the new implementation - will be available soon in pvetest.

Well the new backup/restore sounds good but unfortunately I'm implementing within the month I expect, which means the current platform. I'm running 2.2 with no kernel updates and its showing this strange double lzo behaviour. It's pretty obvious to me that something is wrong here as it should be the destination's IO being saturated not the source since the source is compressed.

tom · Jan 10, 2013

we are not aware of such an issue. in any case, always run the latest version and never the ISO without updates.

and test again.

Trimmings · Jan 11, 2013

Well this is entirely reproducible. Copying these files from this drive gives me a drive read speed of 110MB/s (I use iostat for drive performance monitoring). Whatever is happening here is making the drive settle down at about 25.5MB/s. I've taken your advice and updated (using apt-get upgrade) - assuming this is what you mean since I don't see anything in the install documentation or even readily in any HOW-TO's about that. The result is the same.

I understand you're working on a new (better?) backup system which is great however it's not available now, not stable soon and this is taking 4x longer than it should. It makes restores tedious and also seems to put significant and unnecessary load on the restore device, which is not a good thing at all.

Re the file names, you've got no answer for this so I guess I'll have to run a script to manually rename the files to something more legible after each backup.

tom said:
we are not aware of such an issue. in any case, always run the latest version and never the ISO without updates.

and test again.

Trimmings · Jan 11, 2013

Oh, and also there is an error in the log - I saw it before but thought it was a once off, perhaps it's not... "tar: write error"

extracting archive '/mnt/tmp/dump/vzdump-qemu-101-2013_01_09-13_23_02.tar.lzo'
extracting 'qemu-server.conf' from archive
extracting 'vm-disk-ide0.vmdk' from archive
Formatting '/var/lib/vz/images/101/vm-101-disk-1.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:101/vm-101-disk-1.vmdk'
restore data to '/var/lib/vz/images/101/vm-101-disk-1.vmdk' (85777448960 bytes)
tar: write error
81+51603683 records in
327215+0 records out
85777448960 bytes (86 GB) copied, 2539.01 s, 33.8 MB/s
extracting 'vm-disk-virtio1.vmdk' from archive
Formatting '/var/lib/vz/images/101/vm-101-disk-2.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:101/vm-101-disk-2.vmdk'
restore data to '/var/lib/vz/images/101/vm-101-disk-2.vmdk' (4259840 bytes)
0+3500 records in
16+1 records out
4259840 bytes (4.3 MB) copied, 0.0160067 s, 266 MB/s
TASK OK

That, to me is pretty concerning - is the restore even correct? I guess I'll have to do md5sums or something to find out.

tom · Jan 11, 2013

Trimmings said:
tar: write error
..

we see this error if you restore a 1.x backup to a 2.x. but the restore is still correct. I suggest you upgrade the tar packet (just install the tar from wheezy via dpkg) and the error is away.

Code:

wget http://ftp.at.debian.org/debian/pool/main/t/tar/tar_1.26-4_amd64.deb

dpkg -i tar_1.26-4_amd64.deb

Trimmings · Jan 14, 2013

I've only ever used proxmox 2.2 using a 2.2 install disk. This is a 2.2 backup. I've done the upgrade, will test it again. Here is the log of a large (380GB) server restore - only the very last file (35gb) recovered at a reasonable speed of 80MB/s - the others (except the one that was kilobytes) were atrocious - at about 20MB/s. Backup speeds have been great - but the recovery speeds here make the case of recovery silly. What are other options with the backup? Is there some doc or config somewhere? I wish this was just a bunch of files in lzo then I could just extract them myself but as it is I've have to extract the tar then extract those files manually (which to be honest would still only take 2 hours not 5).

extracting archive '/mnt/tmp/dump/vzdump-qemu-104-2013_01_09-16_49_46.tar.lzo'
extracting 'qemu-server.conf' from archive
extracting 'vm-disk-virtio4.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-1.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-1.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-1.vmdk' (27265007616 bytes)
6+15332410 records in
104007+1 records out
27265007616 bytes (27 GB) copied, 1142.74 s, 23.9 MB/s
extracting 'vm-disk-virtio6.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-2.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-2.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-2.vmdk' (64832602112 bytes)
4+28646813 records in
247316+1 records out
64832602112 bytes (65 GB) copied, 3543.18 s, 18.3 MB/s
extracting 'vm-disk-virtio3.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-3.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-3.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-3.vmdk' (10240 bytes)
0+1 records in
0+1 records out
10240 bytes (10 kB) copied, 5.5032e-05 s, 186 MB/s
extracting 'vm-disk-virtio1.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-4.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-4.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-4.vmdk' (94292410368 bytes)
5+45680409 records in
359697+0 records out
94292410368 bytes (94 GB) copied, 4418.13 s, 21.3 MB/s
extracting 'vm-disk-virtio5.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-5.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-5.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-5.vmdk' (187553611776 bytes)
tar: write error
15+80590644 records in
715460+1 records out
187553611776 bytes (188 GB) copied, 10390.4 s, 18.1 MB/s
extracting 'vm-disk-virtio0.vmdk' from archive
Formatting '/var/lib/vz/images/104/vm-104-disk-6.vmdk', fmt=vmdk size=32768 compat6=off
new volume ID is 'local:104/vm-104-disk-6.vmdk'
restore data to '/var/lib/vz/images/104/vm-104-disk-6.vmdk' (34974138368 bytes)
8+32316590 records in
133415+1 records out
34974138368 bytes (35 GB) copied, 434.407 s, 80.5 MB/s
TASK OK

Trimmings · Jan 14, 2013

Oh also - and maybe this is just a FYI for the devs but it's worth mentioning - I had a vmdk in one server that was preallocated. The drive worked fine, however the backup didn't backup the actual vmdk data file - only the reference/config vmdk file. If anyone else is doing this be careful - your backups won't be including these files.

pdrolet · Jan 15, 2013

Hello,

Having same very sloooow restore. Restore of a 220GB kvm raw format.
---
qmrestore vzdump-qemu-100-2013_01_12-07_20_20.tar.gz 102 --storage drbd3 --unique
extracting archive '/mnt/data/dump/vzdump-qemu-100-2013_01_12-07_20_20.tar.gz'

extracting 'qemu-server.conf' from archive
extracting 'vm-disk-virtio0.raw' from archive
Logical volume "vm-102-disk-1" created
new volume ID is 'drbd3:vm-102-disk-1'
restore data to '/dev/drbdvg3/vm-102-disk-1' (246960619520 bytes)

0+152233334 records in
942080+0 records out
246960619520 bytes (247 GB) copied, 17129.9 s, 14.4 MB/s
---

this is on a somewhat fast new server (raid 10, sas HD 10k). hdparm gives me a speed of about 400MB write. But when I copy a large file, I get usually around 200MB throughput. The backup is on a local hard drive that can be read at a speed of about 220MB/s.

I tried many different bs size in the dd exec command with no change of this speed.

This is potentially a stopper for us implementing proxmox vm in our server farms.

BTW, there were no running vm on the server, but the restore was done on a drbd volume replicated.

Any solution?

Patrice

Trimmings · Jan 16, 2013

Your restores are slower than mine - what is the usual file copy speed from your source? Have you looked in processes to see if there are two lzop processes working? This is what I've seen, and I've read lzop isn't multithreaded so I don't know what this extra process is doing but I expect its killing read performance on the backup drive.

pdrolet said:
Having same very sloooow restore. Restore of a 220GB kvm raw format.
246960619520 bytes (247 GB) copied, 17129.9 s, 14.4 MB/s

pdrolet · Jan 16, 2013

Hello

The speed of the source is about 200 MB/sec when tested with hdparm. And as I said, this drive is on the same sever - so no wire here.

This was a gz backup. And yes, top showed 2 processes gz, each running at close to 100% their respective core (no screen capture done).

I tried with a smaller backup that was a lzo backup and I got faster speed and the 2 lzo process were less demanding (one process around 80 100% and the other one around 5 to 25%). See capture screen.

root@vmlidi1:/mnt/data/dump# qmrestore vzdump-qemu-101-2013_01_12-09_02_04.tar.lzo 103 --storage drbd3 --unique
extracting archive '/mnt/data/dump/vzdump-qemu-101-2013_01_12-09_02_04.tar.lzo'
extracting 'qemu-server.conf' from archive
extracting 'vm-disk-virtio0.raw' from archive
Logical volume "vm-103-disk-1" created
new volume ID is 'drbd3:vm-103-disk-1'
restore data to '/dev/drbdvg3/vm-103-disk-1' (34359738368 bytes)
0+44412036 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 498.131 s, 69.0 MB/s

69 MB/s is more acceptable. The bottle neck is probably the decompression step. But why 2 process?

Search

Search

Slow restore from backups - two lzo processes thrashing restore drive?

Trimmings

Well-Known Member

dietmar

Proxmox Staff Member

Trimmings

Well-Known Member

tom

Proxmox Staff Member

Trimmings

Well-Known Member

tom

Proxmox Staff Member

Trimmings

Well-Known Member

Trimmings

Well-Known Member

tom

Proxmox Staff Member

Trimmings

Well-Known Member

Trimmings

Well-Known Member

pdrolet

New Member

Trimmings

Well-Known Member

pdrolet

New Member

We value your privacy