Proxmox 3.0 - Slow VM Migration

samwayne

New Member
Jun 21, 2013
16
0
1
Hello,

I have a two Node Cluster setup and trying to test offline migration of a 20G KVM VM but it seems the migration is taking about 30 mins to complete whether I create the VM with a RAW or a QCOW2 image. I tried googling and checking thru the mailing list and the forum but I haven't come up with anything concrete so hence my post here.

The VM is using Virtio for both Hard Disk and Network. It was created with QCOW2 and cache=writeback, 4 vCPU Cores and 2GB Ram.

Code:
 [B]Migration task log[/B]

Jun 24 22:44:54 starting migration of VM 100 to node 'NodeA' 
Jun 24 22:44:54 copying disk images
vm-100-disk-1.qcow2

rsync status: 32768   0%    0.00kB/s    0:00:00  
rsync status: 224231424   1%   11.14MB/s    0:31:04  
rsync status: 435585024   2%   11.16MB/s    0:30:41  
rsync status: 646840320   3%   11.16MB/s    0:30:23  
rsync status: 869203968   4%   11.14MB/s    0:30:06  
rsync status: 1079705600  5%   11.12MB/s    0:29:51 
.
.
.
rsync status: 20406534144  95%   11.18MB/s    0:01:33  
rsync status: 20629192704  96%   11.14MB/s    0:01:14  
rsync status: 20840218624  97%   11.13MB/s    0:00:55  
rsync status: 21051637760  98%   11.18MB/s    0:00:37  
rsync status: 21274722304  99%   11.17MB/s    0:00:17  
rsync status: 21478375424 100%   11.15MB/s    0:30:36 (xfer#1, to-check=0/1)

sent 21480997377 bytes  received 31 bytes  11690338.73 bytes/sec
total size is 21478375424  speedup is 1.00
Jun 24 23:15:49 migration finished successfuly (duration 00:30:55)
TASK OK

Each Node has the following Hardware specs:

Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz
16GB Ram
2 x 1TB Drive RAID 0
Raid Card - 3ware Inc 9650SE SATA-II
2 x 1Gb/s NICs (but only 100 Mb/s uplink)

Both Nodes are the same VLAN.

Code:
[B]# pveperf[/B]

CPU BOGOMIPS:      54400.16
REGEX/SECOND:      1537301
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    185.44 MB/sec
AVERAGE SEEK TIME: 8.19 ms
FSYNCS/SECOND:     40.49
DNS EXT:           67.50 ms
DNS INT:           107.50 ms (mycluster.com)

Code:
root@nodea:~# [B]dd if=/dev/zero of=/tmp/output.img bs=8k count=256k[/B]
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 2.53249 s, 848 MB/s

I tried to put as much information as possible but let me know if I missed anything.

Thank you all in advance.

Samwayne
 
If you are migrating through lan, and you state that your nic is at 100Mbs, you can't have more than 10MB/s of transfer, that is what you are getting, or am I missing something?
Also your fsync/sec is really bad, migration apart you will never have decent I/O performance for your vms
 
mmenaz,

You are absolutely correct! Somehow in my mind I was thinking I should be getting ~100MB/s on a 1Gbps NIC but I keep forgetting the uplink is only 100Mbps so hence the 10-11MB/s transfer limitation.

Is there anything that can be done to improve the fsync/sec?

Regards,

Samwayne
 
Ok, fine!
About fsync, the theory I've understood so far is that
- sata usually have writeback cache active, in fact with a single sata you get around 800-1000 fsync/sec
- if you install a raid controller, it disables the cache of the disks, considered unsafe, so your fsync drop dramatically (like yours)
- you can push fsync a lot higher enabling raid controller write back cache, but this is really unsafe since it's usually a big one
- you can protect raid controller cache from data loss with a BBU. So you have to buy a raid controller that suppots BBU + the BBU and enable write back cache when BBU is ok, and be prepared to replace BBU after 2 years. Or buy a controller with solid state "bbu"

That's all :)
 
OK I have removed the Raid controller and using the two disks as standalone, now the fsync/sec is much better:

Code:
root@nodea:~# pveperf
CPU BOGOMIPS:      54400.72
REGEX/SECOND:      1639976
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    127.68 MB/sec
AVERAGE SEEK TIME: 8.60 ms
[B]FSYNCS/SECOND:     1710.11[/B]
DNS EXT:           154.78 ms
DNS INT:           105.06 ms (mycluster.com)

and also

Code:
root@nodea:~# dd if=/dev/zero of=/tmp/output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 1.92149 s, 1.1 GB/s

So as far as the Hard Disk goes that is a massive improvement.

Thanks again.

Samwayne.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!