Slow restore on lvm-thin

michaeljk

Renowned Member
Oct 7, 2009
59
2
73
We just installed a fresh Proxmox 4.2 on a Dell R730xd server and now I'm trying to import some KVM based virtual machine backups to it. The Dell server is configured with 2x SSD (RAID1, OS) and 8x SATA SSD (RAID10, lvm-thin vmstore), Perc H730 Mini Controller. We are currently using qcow2 images on the old server (Proxmox 3.4-9), the backups are on a different server with a nfs share.

Unfortunatly, a restore to lvm-thin (with GUI) is really slow - over 20 minutes for a 120GB qcow2 image. Load goes up to 4-5, io wait 12-13%. The RAID10 is configured with adaptive read, writeback, harddrive cache disabled. If I use a mirrored zpool instead of lvm-thin, the restore finishes within 10-12 minutes without the high io - so it seems that the nfs transfer is not the problem.

Is there a known problem with restores from qcow2 to lvm-thin or did I perhaps miss an important option on the RAID-controller?

dd on proxmox node:
Code:
dd bs=1M count=512 if=/dev/zero of=/mnt/lvmtest/test conv=fdatasync && rm /mnt/lvmtest/test
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.430655 s, 1.2 GB/s

dd within virtual machine (kvm):
Code:
dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync && rm test
512+0 Datensätze ein
512+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 0,637281 s, 842 MB/s

pveperf
Code:
CPU BOGOMIPS:      115215.48
REGEX/SECOND:      2509164
HD SIZE:           15.62 GB (/dev/mapper/vmdata-lvmtest)
BUFFERED READS:    3153.14 MB/sec
AVERAGE SEEK TIME: 0.03 ms
FSYNCS/SECOND:     6969.96
DNS EXT:           27.26 ms
DNS INT:           54.58 ms
 
Unfortunatly, a restore to lvm-thin (with GUI) is really slow - over 20 minutes for a 120GB qcow2 image.

A quick calculation reveals this is 100MB/s (120000/(20*60)). So this is likely the speed of your backup storage (maybe NFS over 1GBIT network)?
 
The nfs server uses a 1GBIT connection, that's correct. But the vzdump file itself has only 76GB (lzo-compressed), so that should be 76000/100/60 = 12,6 minutes transfer time (which more or less equals the amount of time when restoring to a zpool when the controller is in hba mode). The uncompressed qcow2 image has the full size (~120GB).

Edit:
I have one small test vm running on the new server (just a Debian base install) which seems to get unresponsive as soon as the restore is running:

dmesg:
Code:
[  960.610400] INFO: task kvm:2514 blocked for more than 120 seconds.
[  960.610420]       Tainted: P           O    4.4.15-1-pve #1
[  960.610436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  960.610459] kvm             D ffff882009b8fd08     0  2514      1 0x00000000
[  960.610462]  ffff882009b8fd08 ffffc9000d2b8cc0 ffff882037520dc0 ffff8810317344c0
[  960.610464]  ffff882009b90000 ffff882009b8fe40 ffff882009b8fe38 ffff8810317344c0
[  960.610466]  00007fb97ffff700 ffff882009b8fd20 ffffffff8184d945 7fffffffffffffff
[  960.610468] Call Trace:
[  960.610478]  [<ffffffff8184d945>] schedule+0x35/0x80
[  960.610480]  [<ffffffff81850b85>] schedule_timeout+0x235/0x2d0
[  960.610485]  [<ffffffff816b8356>] ? dm_make_request+0x76/0xd0
[  960.610490]  [<ffffffff813c3dd0>] ? generic_make_request+0x110/0x1f0
[  960.610493]  [<ffffffff8184e3bc>] wait_for_completion+0xbc/0x140
[  960.610497]  [<ffffffff810ac9e0>] ? wake_up_q+0x70/0x70
[  960.610501]  [<ffffffff813ba5ab>] submit_bio_wait+0x6b/0x90
[  960.610503]  [<ffffffff813c7b2d>] blkdev_issue_flush+0x5d/0x90
[  960.610507]  [<ffffffff81248545>] blkdev_fsync+0x35/0x50
[  960.610511]  [<ffffffff8124160d>] vfs_fsync_range+0x3d/0xb0
[  960.610516]  [<ffffffff81103425>] ? SyS_futex+0x85/0x180
[  960.610518]  [<ffffffff812416dd>] do_fsync+0x3d/0x70
[  960.610519]  [<ffffffff81241993>] SyS_fdatasync+0x13/0x20
[  960.610521]  [<ffffffff81851a76>] entry_SYSCALL_64_fastpath+0x16/0x75

A simple "lvdisplay" is also very slow/unresponsive, at least until the io wait reduces and the restore process executes the lzop again.
 
Last edited:
I manually patched QemuServer.pm, unfortunatly the problem persists. My guess is that the integrated PERC controller (or a RAID setting) is the reason for this. As soon as the controller is in HBA mode, performance is great - import of the 76GB backup on a zfs volume takes ~12 min. over gigabit network and the other vm is not affected in speed.

Perhaps someone who is using Dell / Perc can give some advises?
 
I use LSI MEgaraid controller and noticing the same. Moving from ZFS to hw raid and its quite a bit slower. :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!