[SOLVED] Very slow live VM migration

TVTDatos · Jan 9, 2020

Hi all,

This is my first post here so please sorry if Im doing something wrong.

I've searched the forum but none of the other topics seems to be related to my issue.

Im using shared storage (ceph) and 1G link for cluster comms (storage is 20G).

Im using pvemanager 5.4-13

log of migration:

Code:

2020-01-08 07:24:52 starting migration of VM 130 to node 'node6' (10.99.0.16)
2020-01-08 07:24:55 copying disk images
2020-01-08 07:24:55 starting VM 130 on remote node 'node6'
2020-01-08 07:25:01 start remote tunnel
2020-01-08 07:25:03 ssh tunnel ver 1
2020-01-08 07:25:03 starting online/live migration on unix:/run/qemu-server/130.migrate
2020-01-08 07:25:03 migrate_set_speed: 8589934592
2020-01-08 07:25:03 migrate_set_downtime: 0.1
2020-01-08 07:25:03 set migration_caps
2020-01-08 07:25:03 set cachesize: 536870912
...
2020-01-08 08:08:56 migration speed: 1.56 MB/s - downtime 2663 ms
2020-01-08 08:08:56 migration status: completed
2020-01-08 08:09:02 migration finished successfully (duration 00:44:17)
TASK OK

Not sure where to start looking for the issue.

Any help is appreciated!

spirit · Jan 9, 2020

Can you send full log ? Is 10.99.0.16 on your 1gb or 10g link ?

TVTDatos · Jan 9, 2020

Hi,

10.99.0.16 is on 1gb link

Here is the full log: https://p.libren.ms/view/f023b5b5

spirit · Jan 9, 2020

ok,

it's not that it's slow ( 1.56 MB/s is computed from total memory / the total time),

but the migration process works like this

- full memory is copied to target host
- then if memory changes occur during the first copy, delta memory is copied to target
- then if memory changes occur during the last delta, a new delta memory is copied to target
- and again and again.

If you have a lot of memory changed (I have see that with database or java application),
you can have a lot of delta retrying (you can see the counter "remaining" going down/up),
it can take more time. (and the bigger memory is the vm, the more delta you'll have).

Only possibily: use bigger link or smaller vm memory.

you can change the network if you want, to use your 10gb links:

/etc/pve/datacenter.cfg:

migration: network=10.99.0.0/24 (change to your 10gb ip address network)

you can also disable ssh tunnel used by migration, it's speedup also migration

migration: network=10.99.0.0/24,type=insecure

TVTDatos · Jan 9, 2020

Thanks for your reply, but 1MB/s (~10MBit/s) in a 1Gbit link is really slow.

This is the network graph for the receiving node:

Also, it happens on every machine I try to migrate. The one of the example is just an NFS server so no continous memory change at all.

spirit · Jan 10, 2020

The memory transfert is really something pure network, not using too much cpu.
Looking at the logs transferred value for each second and your graphs, your are indeed between 10-20mbit/s.

can you try to disable ssh tunnel to compare ? ("insecure" option from my last post).

and can you try to use the 10gbit network to compare too ?

TVTDatos · Jan 10, 2020

I've set type:insecure for migrations and moved a windows VM.

Speed is 108MB/s.

Also moving the network to the 20G one speed is 455MB/s

Im very sorry as I never thought that secure migration was going to nerf speed THAT MUCH! or I would try that first.

Thank you very much for your help @spirit I really appreciate it.

spirit · Jan 10, 2020

Yes, depend if your CPU speed, ssh tunnel can be slow, because its use only 1core for encryption.

Search

Search

[SOLVED] Very slow live VM migration

TVTDatos

Member

spirit

Distinguished Member

TVTDatos

Member

spirit

Distinguished Member

TVTDatos

Member

spirit

Distinguished Member

TVTDatos

Member

spirit

Distinguished Member

We value your privacy