[SOLVED] Very slow live VM migration

TVTDatos

Member
Jan 8, 2020
4
0
6
Hi all,

This is my first post here so please sorry if Im doing something wrong.

I've searched the forum but none of the other topics seems to be related to my issue.

Im using shared storage (ceph) and 1G link for cluster comms (storage is 20G).

Im using pvemanager 5.4-13

log of migration:
Code:
2020-01-08 07:24:52 starting migration of VM 130 to node 'node6' (10.99.0.16)
2020-01-08 07:24:55 copying disk images
2020-01-08 07:24:55 starting VM 130 on remote node 'node6'
2020-01-08 07:25:01 start remote tunnel
2020-01-08 07:25:03 ssh tunnel ver 1
2020-01-08 07:25:03 starting online/live migration on unix:/run/qemu-server/130.migrate
2020-01-08 07:25:03 migrate_set_speed: 8589934592
2020-01-08 07:25:03 migrate_set_downtime: 0.1
2020-01-08 07:25:03 set migration_caps
2020-01-08 07:25:03 set cachesize: 536870912
...
2020-01-08 08:08:56 migration speed: 1.56 MB/s - downtime 2663 ms
2020-01-08 08:08:56 migration status: completed
2020-01-08 08:09:02 migration finished successfully (duration 00:44:17)
TASK OK

Not sure where to start looking for the issue.

Any help is appreciated!
 
ok,

it's not that it's slow ( 1.56 MB/s is computed from total memory / the total time),

but the migration process works like this

- full memory is copied to target host
- then if memory changes occur during the first copy, delta memory is copied to target
- then if memory changes occur during the last delta, a new delta memory is copied to target
- and again and again.

If you have a lot of memory changed (I have see that with database or java application),
you can have a lot of delta retrying (you can see the counter "remaining" going down/up),
it can take more time. (and the bigger memory is the vm, the more delta you'll have).

Only possibily: use bigger link or smaller vm memory.

you can change the network if you want, to use your 10gb links:

/etc/pve/datacenter.cfg:

migration: network=10.99.0.0/24 (change to your 10gb ip address network)

you can also disable ssh tunnel used by migration, it's speedup also migration

migration: network=10.99.0.0/24,type=insecure
 
Thanks for your reply, but 1MB/s (~10MBit/s) in a 1Gbit link is really slow.

This is the network graph for the receiving node:
1578582782679.png

Also, it happens on every machine I try to migrate. The one of the example is just an NFS server so no continous memory change at all.
 
The memory transfert is really something pure network, not using too much cpu.
Looking at the logs transferred value for each second and your graphs, your are indeed between 10-20mbit/s.

can you try to disable ssh tunnel to compare ? ("insecure" option from my last post).

and can you try to use the 10gbit network to compare too ?
 
I've set type:insecure for migrations and moved a windows VM.

Speed is 108MB/s.

Also moving the network to the 20G one speed is 455MB/s

Im very sorry as I never thought that secure migration was going to nerf speed THAT MUCH! or I would try that first.

Thank you very much for your help @spirit I really appreciate it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!