Slow restore on lvm-thin

michaeljk · Aug 21, 2016

We just installed a fresh Proxmox 4.2 on a Dell R730xd server and now I'm trying to import some KVM based virtual machine backups to it. The Dell server is configured with 2x SSD (RAID1, OS) and 8x SATA SSD (RAID10, lvm-thin vmstore), Perc H730 Mini Controller. We are currently using qcow2 images on the old server (Proxmox 3.4-9), the backups are on a different server with a nfs share.

Unfortunatly, a restore to lvm-thin (with GUI) is really slow - over 20 minutes for a 120GB qcow2 image. Load goes up to 4-5, io wait 12-13%. The RAID10 is configured with adaptive read, writeback, harddrive cache disabled. If I use a mirrored zpool instead of lvm-thin, the restore finishes within 10-12 minutes without the high io - so it seems that the nfs transfer is not the problem.

Is there a known problem with restores from qcow2 to lvm-thin or did I perhaps miss an important option on the RAID-controller?

dd on proxmox node:

Code:

dd bs=1M count=512 if=/dev/zero of=/mnt/lvmtest/test conv=fdatasync && rm /mnt/lvmtest/test
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.430655 s, 1.2 GB/s

dd within virtual machine (kvm):

Code:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync && rm test
512+0 Datensätze ein
512+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 0,637281 s, 842 MB/s

pveperf

Code:

CPU BOGOMIPS:      115215.48
REGEX/SECOND:      2509164
HD SIZE:           15.62 GB (/dev/mapper/vmdata-lvmtest)
BUFFERED READS:    3153.14 MB/sec
AVERAGE SEEK TIME: 0.03 ms
FSYNCS/SECOND:     6969.96
DNS EXT:           27.26 ms
DNS INT:           54.58 ms

dietmar · Aug 21, 2016

michaeljk said:
Unfortunatly, a restore to lvm-thin (with GUI) is really slow - over 20 minutes for a 120GB qcow2 image.

A quick calculation reveals this is 100MB/s (120000/(20*60)). So this is likely the speed of your backup storage (maybe NFS over 1GBIT network)?

michaeljk · Aug 21, 2016

The nfs server uses a 1GBIT connection, that's correct. But the vzdump file itself has only 76GB (lzo-compressed), so that should be 76000/100/60 = 12,6 minutes transfer time (which more or less equals the amount of time when restoring to a zpool when the controller is in hba mode). The uncompressed qcow2 image has the full size (~120GB).

Edit:
I have one small test vm running on the new server (just a Debian base install) which seems to get unresponsive as soon as the restore is running:

dmesg:

Code:

[  960.610400] INFO: task kvm:2514 blocked for more than 120 seconds.
[  960.610420]       Tainted: P           O    4.4.15-1-pve #1
[  960.610436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  960.610459] kvm             D ffff882009b8fd08     0  2514      1 0x00000000
[  960.610462]  ffff882009b8fd08 ffffc9000d2b8cc0 ffff882037520dc0 ffff8810317344c0
[  960.610464]  ffff882009b90000 ffff882009b8fe40 ffff882009b8fe38 ffff8810317344c0
[  960.610466]  00007fb97ffff700 ffff882009b8fd20 ffffffff8184d945 7fffffffffffffff
[  960.610468] Call Trace:
[  960.610478]  [<ffffffff8184d945>] schedule+0x35/0x80
[  960.610480]  [<ffffffff81850b85>] schedule_timeout+0x235/0x2d0
[  960.610485]  [<ffffffff816b8356>] ? dm_make_request+0x76/0xd0
[  960.610490]  [<ffffffff813c3dd0>] ? generic_make_request+0x110/0x1f0
[  960.610493]  [<ffffffff8184e3bc>] wait_for_completion+0xbc/0x140
[  960.610497]  [<ffffffff810ac9e0>] ? wake_up_q+0x70/0x70
[  960.610501]  [<ffffffff813ba5ab>] submit_bio_wait+0x6b/0x90
[  960.610503]  [<ffffffff813c7b2d>] blkdev_issue_flush+0x5d/0x90
[  960.610507]  [<ffffffff81248545>] blkdev_fsync+0x35/0x50
[  960.610511]  [<ffffffff8124160d>] vfs_fsync_range+0x3d/0xb0
[  960.610516]  [<ffffffff81103425>] ? SyS_futex+0x85/0x180
[  960.610518]  [<ffffffff812416dd>] do_fsync+0x3d/0x70
[  960.610519]  [<ffffffff81241993>] SyS_fdatasync+0x13/0x20
[  960.610521]  [<ffffffff81851a76>] entry_SYSCALL_64_fastpath+0x16/0x75

A simple "lvdisplay" is also very slow/unresponsive, at least until the io wait reduces and the restore process executes the lzop again.

dietmar · Aug 21, 2016

There is also a pending patch, which improves restore performance:

https://git.proxmox.com/?p=qemu-server.git;a=commit;h=eed2430325290346e4a1ad35bc81ef0f26eb5385

(not released so far)

michaeljk · Aug 22, 2016

I manually patched QemuServer.pm, unfortunatly the problem persists. My guess is that the integrated PERC controller (or a RAID setting) is the reason for this. As soon as the controller is in HBA mode, performance is great - import of the 76GB backup on a zfs volume takes ~12 min. over gigabit network and the other vm is not affected in speed.

Perhaps someone who is using Dell / Perc can give some advises?

sahostking · Sep 2, 2016

I use LSI MEgaraid controller and noticing the same. Moving from ZFS to hw raid and its quite a bit slower.

michaeljk · Sep 7, 2016

sahostking said:
I use LSI MEgaraid controller and noticing the same. Moving from ZFS to hw raid and its quite a bit slower.

Any news on this? Did your ZFS configuration run stable on this controller before?

Search

Search

Slow restore on lvm-thin

michaeljk

Renowned Member

dietmar

Proxmox Staff Member

michaeljk

Renowned Member

dietmar

Proxmox Staff Member

michaeljk

Renowned Member

sahostking

Renowned Member

michaeljk

Renowned Member