Live Migration

adamb

Famous Member
Mar 1, 2012
1,329
77
113
Trying to get a better understanding of the live migration process. I notice that in my clusters when using live migration, it hits 32.5MB/s and hits a wall. This is far from the speeds which I should be seeing. Utilizing ssh/rsync and cipher arcfour, I have no issues pulling 150-170MB/s (10GB backend).

Looking at top during the migration and the ssh process isn't working to hard, 15-20%, plenty of room for more speed. Is there anything which could limit this speed? I have VM's with 50+GB, takes forever to migrate them at 32.5MB/s. I appreciate the input.

Should I see the cipher set in this string?

/usr/bin/ssh -o BatchMode=yes root@10.211.46.1 -L 60000:localhost:60000 qm mtunnel

speeds.jpg
 
Last edited:
Trying to get a better understanding of the live migration process. I notice that in my clusters when using live migration, it hits 32.5MB/s and hits a wall. This is far from the speeds which I should be seeing. Utilizing ssh/rsync and cipher arcfour, I have no issues pulling 150-170MB/s (10GB backend).

Looking at top during the migration and the ssh process isn't working to hard, 15-20%, plenty of room for more speed. Is there anything which could limit this speed? I have VM's with 50+GB, takes forever to migrate them at 32.5MB/s. I appreciate the input.

Should I see the cipher set in this string?

/usr/bin/ssh -o BatchMode=yes root@10.211.46.1 -L 60000:localhost:60000 qm mtunnel

View attachment 1303

No one has any ideas on how this works? Or how about some documentation on this?
 
I can't even seem to migrate a VM with large amounts of in use ram. The RAM seems to change before the move is complete, which just restarts the process. As you can see I was crusiing along, almost done, and then boom, more data.

If I could get the transfer speeds up from 32-35MB/s to atleast the 100 range, this wouldn't be an issue. Nothing seems to be the bottle neck. Wait on IO is 0 and the ssh process has very low cpu. There is plenty of room for more speed. Does something limit the speed of the migration process?

Feb 20 10:17:02 migration status: active (transferred 49726959961, remaining 360419328), total 52445970432)
Feb 20 10:17:04 migration status: active (transferred 49794621785, remaining 292757504), total 52445970432)
Feb 20 10:17:06 migration status: active (transferred 49862451545, remaining 224931840), total 52445970432)
Feb 20 10:17:08 migration status: active (transferred 49930138931, remaining 153206784), total 52445970432)
Feb 20 10:17:10 migration status: active (transferred 49997542820, remaining 85340160), total 52445970432)
Feb 20 10:17:12 migration status: active (transferred 50065335761, remaining 9782226944), total 52445970432)
Feb 20 10:17:14 migration status: active (transferred 50133096073, remaining 9713713152), total 52445970432)
 
Last edited:
VM is still trying to migrate. 1 hour and 15 minutes so far. The amount transfered is far more than the total as you can see. My only guess is becuase the ram is filling faster than we can move it.

Feb 20 11:10:23 migration status: active (transferred 157500912280, remaining 193880064), total 52445970432)
Feb 20 11:10:23 migration status: active (transferred 157511021208, remaining 183771136), total 52445970432)
Feb 20 11:10:23 migration status: active (transferred 157521126040, remaining 173666304), total 52445970432)
Feb 20 11:10:24 migration status: active (transferred 157531230872, remaining 163561472), total 52445970432)
Feb 20 11:10:24 migration status: active (transferred 157541335704, remaining 153456640), total 52445970432)
Feb 20 11:10:24 migration status: active (transferred 157551440536, remaining 143351808), total 52445970432)
Feb 20 11:10:24 migration status: active (transferred 157561545368, remaining 133246976), total 52445970432)
Feb 20 11:10:25 migration status: active (transferred 157571654296, remaining 123138048), total 52445970432)
Feb 20 11:10:25 migration status: active (transferred 157581759128, remaining 113033216), total 52445970432)
Feb 20 11:10:25 migration status: active (transferred 157591863960, remaining 102932480), total 52445970432)
Feb 20 11:10:26 migration status: active (transferred 157601968792, remaining 92827648), total 52445970432)

Still hitting the wall of 32-35MB/s. Something has to be holding this back.
 
i thnk we'd need a bit more information about the physical servers and virtual machine that you're migrating.
 
Broadcom 10GB NIC

Are you running the new driver versions? i've seen a few posts about issues with these... also what versions of PVE are you running? Im not sure whether version 2.2 has the new drivers but 2.3 definatly has them.
 
Are you running the new driver versions? i've seen a few posts about issues with these... also what versions of PVE are you running? Im not sure whether version 2.2 has the new drivers but 2.3 definatly has them.

Yep I am the one who pinned down the driver issue in the first place :). I have two cluster on 2.6.32-16-pve and one on 2.6.32-17-pve, all of which have this issue.
 
Yep I am the one who pinned down the driver issue in the first place :). I have two cluster on 2.6.32-16-pve and one on 2.6.32-17-pve, all of which have this issue.

i'm guessing you've updated to the newest drivers then ;)

I must admit i have no experince of dealing with VMs with over 50GB RAM - the most mine have are 8GB :eek:

I'm assuming that iperf tests between the boxes all show decent network connections?

Do you have any way of monitoring the nic speeds via SNMP? (I use prtg) are the servers definately using the 10GB NICs for migration?
 
Yea iperf tests look good, 9.5Gbit/s on the average. No problem scp/rsyncing over ssh up to 175MB/s (bottle neck is ssh at this point) (Can't wait until AES-NI support).

I can watch the traffic with jnettop as its moving. I also monitor with my snmp collector (zabbix).
 
Yea iperf tests look good, 9.5Gbit/s on the average. No problem scp/rsyncing over ssh up to 175MB/s (bottle neck is ssh at this point) (Can't wait until AES-NI support).

I can watch the traffic with jnettop as its moving. I also monitor with my snmp collector (zabbix).
Maybe this is where your problems originate from because AFAIK live migration is done through a ssh tunnel. Try logging into your task log for your live migrations.
 
Maybe this is where your problems originate from because AFAIK live migration is done through a ssh tunnel. Try logging into your task log for your live migrations.

I am seeing 32-35MB/s during migration but can get 150-175MB/s using SSH. This is the exact reason I am bringing this up, I should be able to do live migration at 150-175MB/s.
 
try to send a mail to pve-devel@pve.proxmox.com.

We had have a discussion some weeks ago, about a patch to allow migration without ssh tunnel, but It was refused because of security concern.

I would be happy if I could just pull full speeds over ssh. 150MB/s is leaps and bounds better than 35MB/s. Live migration would be great without ssh, I can see it useful in quite a few situation.

I will try to send something with my vote!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!