Live migration network performance

JDA

Member
Aug 29, 2021
5
0
6
41
After a small network upgrade, I try to tune the performance for the live migration. With the secure migration mode I currently get 300-400 MiB/s. In the insecure migration mode, I get around 1,6 GiB/s.

This is still half the speed I get with iperf on one parallel test transfer...

Are there any other knobs to use more bandwidth?
 
is it a live migration + local storage migration ? or only a pure live migration (with only the vm memory to tranfert)

what is your physical links bandwidth ?

do you have some migration logs to provide ?
 
That's only memory migration - storage is on ceph.

It's a 40GbE without RDMA - currently also used by ceph (not activated second port yet).

Here's the output from the migration:
Code:
2021-08-29 15:44:58 use dedicated network address for sending migration traffic (172.20.253.202)
2021-08-29 15:44:58 starting migration of VM 129 to node 'pve002' (172.20.253.202)
2021-08-29 15:44:58 starting VM 129 on remote node 'pve002'
2021-08-29 15:45:00 start remote tunnel
2021-08-29 15:45:01 ssh tunnel ver 1
2021-08-29 15:45:01 starting online/live migration on tcp:172.20.253.202:60000
2021-08-29 15:45:01 set migration capabilities
2021-08-29 15:45:01 migration downtime limit: 100 ms
2021-08-29 15:45:01 migration cachesize: 4.0 GiB
2021-08-29 15:45:01 set migration parameters
2021-08-29 15:45:01 spice client_migrate_info
2021-08-29 15:45:01 start migrate command to tcp:172.20.253.202:60000
2021-08-29 15:45:02 migration active, transferred 1.1 GiB of 24.1 GiB VM-state, 6.5 GiB/s
2021-08-29 15:45:03 migration active, transferred 2.4 GiB of 24.1 GiB VM-state, 1.3 GiB/s
2021-08-29 15:45:04 migration active, transferred 3.7 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:05 migration active, transferred 5.0 GiB of 24.1 GiB VM-state, 1.3 GiB/s
2021-08-29 15:45:06 migration active, transferred 6.2 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:07 migration active, transferred 7.5 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:08 migration active, transferred 8.8 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:09 migration active, transferred 10.1 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:10 migration active, transferred 11.4 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:11 migration active, transferred 12.6 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:12 migration active, transferred 14.0 GiB of 24.1 GiB VM-state, 1.3 GiB/s
2021-08-29 15:45:13 migration active, transferred 15.3 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:14 migration active, transferred 16.5 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:15 migration active, transferred 17.7 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:16 migration active, transferred 19.0 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:18 migration active, transferred 20.8 GiB of 24.1 GiB VM-state, 1.4 GiB/s
2021-08-29 15:45:19 migration active, transferred 21.4 GiB of 24.1 GiB VM-state, 78.4 MiB/s
2021-08-29 15:45:20 migration active, transferred 21.9 GiB of 24.1 GiB VM-state, 549.4 MiB/s
2021-08-29 15:45:20 xbzrle: send updates to 6566 pages in 446.6 KiB encoded memory, cache-miss 88.24%, overflow 1
2021-08-29 15:45:20 average migration speed: 1.3 GiB/s - downtime 50 ms
2021-08-29 15:45:20 migration status: completed
2021-08-29 15:45:21 Waiting for spice server migration
2021-08-29 15:45:23 migration finished successfully (duration 00:00:25)
TASK OK
 
I'm not sure we can't go faster currently. note that 1,4GiB/s is already around 11gbit/s. Do you really need faster migration ?

I never have tested more than 10gbit/s currently.

the migration is done with small memory block, so maybe packet per second is the limiting factor.
It could be also the cpu (If I remember migration is limited to 1cpu core by qemu process).
 
I have tried to add
Code:
migration: type=insecure
but upon a migration, log is still showing SSH tunnel v1. Any ideas?

Do we need to restart any services after making the file edit in the datacenter configuration file?
 
it's ok, you don't use the ssh tunnel. (still created to send some commands to remote vm)

if you see

with :6000x at the end, you don't use the ssh tunnel to transfert memory.
Thanks. I’ll try again and see if I see that line. Only 1gb links, but still think it could be faster. Hasn’t sped up since the default first move with encryption.
 
it's ok, you don't use the ssh tunnel. (still created to send some commands to remote vm)

if you see

with :6000x at the end, you don't use the ssh tunnel to transfert memory.
Thank you. That was correct. I saw the IP address information as the transfer :60001. Encrypted was about 2 minutes slower on the 32GB transfer. 280mbps average from 360mbps unencrypted.
 
I believe the end speed depends on speed of the cpu single core and contents of memory. I could achieve around 25-30Gbit/s speeds over 100Gbe SFP links, no RDMA (directly connected on 3 node cluster) on AMD epyc 7401P (ceph as storage, so memory copy only). using jumbo frames might drop some overhead also.
 
  • Like
Reactions: Wichets

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!