VM cloning with zvol-volumes TERRIBLE slow

Chris&Patte

Member
Sep 3, 2013
23
0
21
Hello,

i#m actually cloning (offline copy) a VM with 125GB RAM (yeah, really, a HANA setup) and three 300GB zvol-volumes from a (ZVOL-SSD to a ZVOL-HDD).
After 8h running, not even one zvol got copied.
When i look into iotop, i see that there are some qemu-img convert running, each with a write speed BELOW 1MB/s.
The overall write activity is about 5MB/s.

I mean, is this a bad joke? Is this a expected behavior? I wondered already about the slow speed of snapshots (on HyperV nearly instantly, in Proxmox it takes time and time ....) but this makes the system nearly unusable.

How to address this problem? And how to stop the running clone process?

Also a look into ps shows me this
/usr/bin/qemu-img convert -p -n -t none -T none -f raw -O raw /dev/zvol/ZFS-SSD/vm-101-disk-2 zeroinit:/dev/zvol/ZFS-HDD/vm-102-disk-1
But this is the wrong target zvol. It should copy from vm-101-disk-2 to vm-102-disk-2 and not to vm-102-disk-1. Right?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,158
1,594
164
Hello,

i#m actually cloning (offline copy) a VM with 125GB RAM (yeah, really, a HANA setup) and three 300GB zvol-volumes from a (ZVOL-SSD to a ZVOL-HDD).
After 8h running, not even one zvol got copied.
When i look into iotop, i see that there are some qemu-img convert running, each with a write speed BELOW 1MB/s.
The overall write activity is about 5MB/s.

I mean, is this a bad joke? Is this a expected behavior? I wondered already about the slow speed of snapshots (on HyperV nearly instantly, in Proxmox it takes time and time ....) but this makes the system nearly unusable.

How to address this problem? And how to stop the running clone process?

you can monitor with "zpool iostat" and "iostat" to see where the bottle neck is. monitoring "arcstat" might also prove insightful. your zpool layouts might also give a hint at what is possibly going wrong.

Also a look into ps shows me this

But this is the wrong target zvol. It should copy from vm-101-disk-2 to vm-102-disk-2 and not to vm-102-disk-1. Right?

that is no problem, the numbering is per VM. so if vm-101-disk-2 is the first disk to get cloned, it will get cloned to vm-102-disk-1.
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
Well, as i see it just shows me that it is really slow.

OK, i knew the HDD RAID1 is not a rocket. But the ONLY task that is running on those HDDs is the qemu-img convert.
Nothing else on this HDDs and no other VM running. Idling around .....
Even if i expect only 50% performance off ZFS-write compared to a "normal" file copy speed on ext4, it should be around 20MB/s minimum. OK, i the qemu-convert only writes small random pieces i would expect that, but because there is no img-conversation (just raw -> raw), in fact just a dd-job, it should work much faster, shouldn't it?

root@Mypve:/ZFS-HDD# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ZFS-HDD 162G 1.65T 0 23 581K 3.45M
ZFS-SSD 983G 793G 138 229 5.41M 18.0M
rpool 1.03G 110G 0 45 4.51K 423K
---------- ----- ----- ----- ----- ----- -----
root@Mypve:/ZFS-HDD# iostat -h
Linux 5.0.15-1-pve (Mypve) 10/09/2019 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.1% 0.0% 1.3% 1.0% 0.0% 96.6%
tps kB_read/s kB_wrtn/s kB_read kB_wrtn Device
159.62 2.4M 8.0M 795.6G 2.6T sdc
159.36 2.4M 8.0M 793.2G 2.6T sdd
10.72 259.9k 1.5M 84.0G 507.4G sde
10.15 256.1k 1.5M 82.8G 507.4G sdf
22.45 2.4k 211.8k 792.1M 68.5G sdg
0.00 0.1k 0.0k 27.4M 0.0k sda
0.00 0.2k 0.0k 53.4M 0.0k sdb
22.85 2.4k 211.8k 796.9M 68.5G sdh
233.91 201.9k 1.1M 65.3G 365.7G zd16
263.04 1.2M 1.1M 393.0G 357.0G zd32
232.26 286.3k 937.1k 92.6G 303.0G zd48
0.87 0.0k 233.1k 1.0M 75.4G zd0
0.59 0.0k 273.3k 1.0M 88.4G zd64

data storage
sde/sdf = ZFS RAID1 HDD
sdc/sdd = ZFS RAID1 SSD

root@Mypve:/ZFS-HDD# arcstat
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
10:30:04 20 0 0 0 0 0 0 0 0 125G 125G
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
Just to compare the actual results.

The source VM that i#m cloning actually, has been imported some days before from a KVM Box to the actual Proxmox host. I copied the original qcow2 images on the HDDs before converting them into the SSD-zvol. Those copy of the qcow2 images was MUCH faster then the zvol copy to the HDDs now.
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
But i thin k i gave up now and first buy some additional SSDs to replace the HDDs.

Just one question. How to stop the cloning without braking anything?

Can i just kill the qemu-img convert process?
In the GUI there seems to be no way to stop the cloning.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,158
1,594
164
But i thin k i gave up now and first buy some additional SSDs to replace the HDDs.

Just one question. How to stop the cloning without braking anything?

Can i just kill the qemu-img convert process?
In the GUI there seems to be no way to stop the cloning.

killing qemu-img convert or stopping the clone task should have the same effect. I still suggest finding out what's going wrong first, you can always buy better hardware if it turns out that is the cause. all of the commands above are better used for continuous monitoring (they take an interval as parameter).

e.g.:
Code:
zpool iostat -ry POOLNAME 30
iostat -dmxy /dev/DISK1 /dev/DISK2 /dev/DISK3 /dev/disk4 30
arcstat 30

posting the result inside [CODE][/CODE] tags will preserve formatting.
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
ok, you are right. I should check first for the bootleneek

well, then, it reads from zpool ZFS-SSD and writes to zpool ZFS-SSD
Code:
root@Mypve:/ZFS-HDD# zpool iostat ZFS-SSD 5
capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
ZFS-SSD      983G   793G    136    223  5.31M  17.4M
ZFS-SSD      983G   793G     34      0   936K      0
ZFS-SSD      983G   793G     26      0   921K      0
ZFS-SSD      983G   793G     50      0  1.35M      0
ZFS-SSD      983G   793G     16      0   456K      0
ZFS-SSD      983G   793G     45      0  1.33M      0
ZFS-SSD      983G   793G     28      0   913K      0
ZFS-SSD      983G   793G     33      0   928K      0
ZFS-SSD      983G   793G     37      0   924K      0
ZFS-SSD      983G   793G     26      0   948K      0
ZFS-SSD      983G   793G     50      0  1.37M      0
ZFS-SSD      983G   793G     33      0   923K      0
ZFS-SSD      983G   793G     31      0   935K      0

Code:
root@Mypve:/ZFS-HDD# zpool iostat ZFS-HDD 5
capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
ZFS-HDD      178G  1.64T      0     24   564K  3.48M
ZFS-HDD      178G  1.64T      0     70      0  4.71M
ZFS-HDD      178G  1.64T      0     64      0  4.09M
ZFS-HDD      178G  1.64T      0     62      0  4.12M
ZFS-HDD      178G  1.64T      0     67      0  4.27M
ZFS-HDD      178G  1.64T      0     62      0  3.95M
ZFS-HDD      178G  1.64T      0     65      0  3.98M
ZFS-HDD      178G  1.64T      0     61      0  3.91M
ZFS-HDD      178G  1.64T      0     76      0  5.16M
ZFS-HDD      178G  1.64T      0     55      0  3.28M

both together

Code:
root@mypve:/ZFS-HDD# zpool iostat -y ZFS-HDD ZFS-SSD 5
capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
ZFS-HDD      186G  1.63T      0     63      0  3.92M
ZFS-SSD      983G   793G     92      0  2.21M      0
----------  -----  -----  -----  -----  -----  -----
ZFS-HDD      186G  1.63T      0     63      0  4.03M
ZFS-SSD      983G   793G     82      0  2.20M      0
----------  -----  -----  -----  -----  -----  -----
ZFS-HDD      186G  1.63T      0     67      0  4.08M
ZFS-SSD      983G   793G     91      0  2.21M      0
----------  -----  -----  -----  -----  -----  -----

why is read much lesser then write?
Shouldn't it be somewhere a 1:1 relationship?

Code:
root@Mypve:/ZFS-HDD# zpool iostat -ry ZFS-HDD ZFS-SSD 5
ZFS-HDD       sync_read    sync_write    async_read    async_write      scrub         trim
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     60      0      0      0      0      0
16K             0      0      0      0      0      0      0      0      0      0      0      0
32K             0      0      0      0      0      0      0      0      0      0      0      0
64K             0      0      0      0      0      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      5      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------
ZFS-SSD       sync_read    sync_write    async_read    async_write      scrub         trim
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0      0      0      0      0      0      0
8K              0      0      0      0     58      0      0      0      0      0      0      0
16K             0      0      0      0      0      5      0      0      0      0      0      0
32K             0      0      0      0      0      7      0      0      0      0      0      0
64K             0      0      0      0      0     10      0      0      0      0      0      0
128K            0      0      0      0      0      3      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

data storage
sde/sdf = ZFS RAID1 HDD
sdc/sdd = ZFS RAID1 SSD

Code:
root@Mypve:/ZFS-HDD# iostat -dmxy /dev/sde /dev/sdf /dev/sdc /dev/sdd 10
Linux 5.0.15-1-pve (iteanova016pve)     10/09/2019      _x86_64_        (32 CPU)
Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdc             51.80    0.00      1.31      0.00     0.30     0.00   0.58   0.00    0.38    0.00   0.00    25.83     0.00   0.26   1.36
sdd             64.10    0.00      1.25      0.00     0.30     0.00   0.47   0.00    0.27    0.00   0.00    19.89     0.00   0.20   1.28
sde              0.00   32.10      0.00      2.11     0.00     0.20   0.00   0.62    0.00  278.57   8.88     0.00    67.36   4.60  14.76
sdf              0.00   32.00      0.00      2.11     0.00     0.30   0.00   0.93    0.00  294.05   9.35     0.00    67.60   4.50  14.40
Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdc             43.60    0.00      1.11      0.00     0.50     0.00   1.13   0.00    0.39    0.00   0.00    26.07     0.00   0.28   1.24
sdd             51.20    0.00      1.15      0.00     0.20     0.00   0.39   0.00    0.32    0.00   0.00    22.92     0.00   0.23   1.16
sde              0.00   31.50      0.00      2.11     0.00     0.30   0.00   0.94    0.00  293.77   9.21     0.00    68.65   4.44  14.00
sdf              0.00   32.80      0.00      2.04     0.00     0.30   0.00   0.91    0.00  262.52   8.55     0.00    63.61   4.68  15.36
Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdc             44.10    0.00      1.08      0.00     0.40     0.00   0.90   0.00    0.37    0.00   0.00    25.00     0.00   0.26   1.16
sdd             44.20    0.00      1.20      0.00     0.30     0.00   0.67   0.00    0.40    0.00   0.00    27.76     0.00   0.27   1.20
sde              0.00   31.50      0.00      2.04     0.00     0.20   0.00   0.63    0.00  277.40   8.70     0.00    66.36   4.37  13.76
sdf              0.00   32.60      0.00      2.04     0.00     0.20   0.00   0.61    0.00  280.26   9.08     0.00    64.05   4.70  15.32

this is more interestinmg i think. ZFS takes 25% of all RAM and seems still maxed out?

Code:
root@Mypve:/ZFS-HDD# arcstat 5
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
13:52:07     0     0      0     0    0     0    0     0    0   126G  125G
13:52:12  1.1K   256     24     0    0   256  100     0    0   126G  125G
13:52:17  1.1K   256     24     0    0   256  100     0    0   125G  125G
13:52:22   860   205     23     0    0   205  100     0    0   125G  125G
13:52:27  1.5K   358     24     0    0   358  100     0    0   125G  125G
13:52:32   836   205     24     0    0   205  100     0    0   125G  125G
13:52:37  1.1K   256     24     0    0   256  100     0    0   126G  125G
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,158
1,594
164
did you change any of the ZFS parameters, or pool/dataset properties? does regular system monitoring show any kind of overload? "zpool iostat -ly POOLS 5" would also be interesting..
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
The system is in use now since some days. It's a refurbished HP Proliant box. Have not seen anything wrong. Only ONE VM (a S4/HANA) which i try now to clone.

No, i did not changed anything of the ZFS parameters.
But i installed PROXMOX OS over al already installed and somehow messed up older Proxmox installation. But did not reused the zpools. They have been set up newly.

Code:
root@iteanova016pve:/ZFS-HDD# free -mh
total        used        free      shared  buff/cache   available
Mem:          251Gi       128Gi       123Gi        81Mi       383Mi       122Gi
Swap:            0B          0B          0B
root@iteanova016pve:/ZFS-HDD# zpool iostat -ly ZFS-HDD ZFS-SSD 5
capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ZFS-HDD      190G  1.63T      0     63      0  4.37M      -  346ms      -  249ms      -      -      -   18ms      -      -
ZFS-SSD      983G   793G    117      0  2.68M      0  490us      -  397us      -      -      -  110us      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ZFS-HDD      190G  1.63T      0     63      0  4.14M      -  371ms      -  261ms      -      -      -   23ms      -      -
ZFS-SSD      983G   793G    102      0  2.21M      0  448us      -  371us      -      -      -   72us      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ZFS-HDD      190G  1.63T      0     66      0  4.38M      -  369ms      -  253ms      -      -      -   31ms      -      -
ZFS-SSD      983G   793G     85      0  1.77M      0  416us      -  343us      -      -      -   74us      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ZFS-HDD      190G  1.63T      0     49      0  3.13M      -  414ms      -  289ms      -      -      -   28ms      -      -
ZFS-SSD      983G   793G    107      0  2.26M      0  434us      -  359us      -      -      -   77us      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ZFS-HDD      190G  1.63T      0     64      0  4.23M      -  315ms      -  232ms      -      -      -   19ms      -      -
ZFS-SSD      983G   793G    115      0  2.67M      0  520us      -  419us      -      -      -   97us      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,158
1,594
164
what kind of disks are those spinning disks? >300ms wait time for writing out to disk is quite high.. are those SMR disks?
 

Chris&Patte

Member
Sep 3, 2013
23
0
21
The HDs are a completely wrong type for that workload.
In fact they have not been bought for this, but for backup like purpose to store the Tranaction log backups of the S4/HANA system.
I have used them now for the VM cloning because i did not had anything else and the cloned VM was only for some testings/learning to administrate a HANA system.
Therefore i decided to go on with them, as speed don't matter, but in the end...didn't thought that it could be that worse :-( as copying big files to the HD was not sooo bad.

https://www.conrad.de/de/p/western-...obile-bulk-sata-iii-1666940.html?refresh=true

So, in the end it seems that i can really only use them to store backups. As intended. And maybe also therefore they are wrong. Well, if they brake early because of 7/24 then i add it to the list of "costs of experience"

I will buy now 2 (or 4?) additional 2TB SSD and add them to the ZFS-SSD as a RAID10 storage. This should boost the performance to a usable level hopefully.

@fabian
thank you very much for your help. Learned some important things about ZFS now ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!