Proxmox 3.0 - Slow VM Migration

samwayne

New Member
Jun 21, 2013
16
0
1
Hello,

I have a two Node Cluster setup and trying to test offline migration of a 20G KVM VM but it seems the migration is taking about 30 mins to complete whether I create the VM with a RAW or a QCOW2 image. I tried googling and checking thru the mailing list and the forum but I haven't come up with anything concrete so hence my post here.

The VM is using Virtio for both Hard Disk and Network. It was created with QCOW2 and cache=writeback, 4 vCPU Cores and 2GB Ram.

Code:
 [B]Migration task log[/B]

Jun 24 22:44:54 starting migration of VM 100 to node 'NodeA' 
Jun 24 22:44:54 copying disk images
vm-100-disk-1.qcow2

rsync status: 32768   0%    0.00kB/s    0:00:00  
rsync status: 224231424   1%   11.14MB/s    0:31:04  
rsync status: 435585024   2%   11.16MB/s    0:30:41  
rsync status: 646840320   3%   11.16MB/s    0:30:23  
rsync status: 869203968   4%   11.14MB/s    0:30:06  
rsync status: 1079705600  5%   11.12MB/s    0:29:51 
.
.
.
rsync status: 20406534144  95%   11.18MB/s    0:01:33  
rsync status: 20629192704  96%   11.14MB/s    0:01:14  
rsync status: 20840218624  97%   11.13MB/s    0:00:55  
rsync status: 21051637760  98%   11.18MB/s    0:00:37  
rsync status: 21274722304  99%   11.17MB/s    0:00:17  
rsync status: 21478375424 100%   11.15MB/s    0:30:36 (xfer#1, to-check=0/1)

sent 21480997377 bytes  received 31 bytes  11690338.73 bytes/sec
total size is 21478375424  speedup is 1.00
Jun 24 23:15:49 migration finished successfuly (duration 00:30:55)
TASK OK

Each Node has the following Hardware specs:

Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz
16GB Ram
2 x 1TB Drive RAID 0
Raid Card - 3ware Inc 9650SE SATA-II
2 x 1Gb/s NICs (but only 100 Mb/s uplink)

Both Nodes are the same VLAN.

Code:
[B]# pveperf[/B]

CPU BOGOMIPS:      54400.16
REGEX/SECOND:      1537301
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    185.44 MB/sec
AVERAGE SEEK TIME: 8.19 ms
FSYNCS/SECOND:     40.49
DNS EXT:           67.50 ms
DNS INT:           107.50 ms (mycluster.com)

Code:
root@nodea:~# [B]dd if=/dev/zero of=/tmp/output.img bs=8k count=256k[/B]
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 2.53249 s, 848 MB/s

I tried to put as much information as possible but let me know if I missed anything.

Thank you all in advance.

Samwayne
 
If you are migrating through lan, and you state that your nic is at 100Mbs, you can't have more than 10MB/s of transfer, that is what you are getting, or am I missing something?
Also your fsync/sec is really bad, migration apart you will never have decent I/O performance for your vms
 
mmenaz,

You are absolutely correct! Somehow in my mind I was thinking I should be getting ~100MB/s on a 1Gbps NIC but I keep forgetting the uplink is only 100Mbps so hence the 10-11MB/s transfer limitation.

Is there anything that can be done to improve the fsync/sec?

Regards,

Samwayne
 
Ok, fine!
About fsync, the theory I've understood so far is that
- sata usually have writeback cache active, in fact with a single sata you get around 800-1000 fsync/sec
- if you install a raid controller, it disables the cache of the disks, considered unsafe, so your fsync drop dramatically (like yours)
- you can push fsync a lot higher enabling raid controller write back cache, but this is really unsafe since it's usually a big one
- you can protect raid controller cache from data loss with a BBU. So you have to buy a raid controller that suppots BBU + the BBU and enable write back cache when BBU is ok, and be prepared to replace BBU after 2 years. Or buy a controller with solid state "bbu"

That's all :)
 
OK I have removed the Raid controller and using the two disks as standalone, now the fsync/sec is much better:

Code:
root@nodea:~# pveperf
CPU BOGOMIPS:      54400.72
REGEX/SECOND:      1639976
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    127.68 MB/sec
AVERAGE SEEK TIME: 8.60 ms
[B]FSYNCS/SECOND:     1710.11[/B]
DNS EXT:           154.78 ms
DNS INT:           105.06 ms (mycluster.com)

and also

Code:
root@nodea:~# dd if=/dev/zero of=/tmp/output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 1.92149 s, 1.1 GB/s

So as far as the Hard Disk goes that is a massive improvement.

Thanks again.

Samwayne.
 
Last edited: