Hello,
I am trying to fully understand a live migration log, and to find settings that enable it to perform better.
The operation was a live migration with shared storage (iSCSI) so only the VM state was transferred.
I uploaded the migration log on https://pastebin.com/VtDgQUw6
I am trying to understand all the info here.
Quoting a post by forum user @spirit :
Let me quote 4 particular lines from my log :
Nothing special here, first part of migration (full copy).
Here "transferred" is more than the VM total RAM, so I guess we have entered the part where we transfer delta for RAM that was written while the first part was running ?
We can find some information in the xbzrle documentation :
Here we see that the VM dirties memory faster than the network transfer speed, so the migration might never end !
In the end, we fixed this by stopping the app on the VM, so that the remaining dirty pages could be transferred.
After this experience, we have set migrations to use a faster network, and insecure mode.
You can choose the migration network in the web GUI, in Datacenter > Options > Migration settings.
You can set insecure mode by editing /etc/pve/datacenter.cfg in a cluster node.
Last question : is there a way to alter the threshold for the latest delta copy ? (when the dirty memory on the vm is below this threshold, the vm will be freezed, one last transfer will be done, and the vm will awaken on the destination server)
Do you think of any other settings to be known regarding live migrations ?
Thanks for reading and hopefully contributing to this thread
I am trying to fully understand a live migration log, and to find settings that enable it to perform better.
The operation was a live migration with shared storage (iSCSI) so only the VM state was transferred.
I uploaded the migration log on https://pastebin.com/VtDgQUw6
I am trying to understand all the info here.
Quoting a post by forum user @spirit :
the migration process works like this
- full memory is copied to target host
- then if memory changes occur during the first copy, delta memory is copied to target
- then if memory changes occur during the last delta, a new delta memory is copied to target
- and again and again.
If you have a lot of memory changed (I have see that with database or java application),
you can have a lot of delta retrying (you can see the counter "remaining" going down/up),
it can take more time. (and the bigger memory is the vm, the more delta you'll have).
Let me quote 4 particular lines from my log :
2022-03-03 17:48:59 migration active, transferred 1.6 GiB of 32.0 GiB VM-state, 32.6 MiB/s
Nothing special here, first part of migration (full copy).
2022-03-03 18:08:34 migration active, transferred 40.6 GiB of 32.0 GiB VM-state, 34.5 MiB/s
Here "transferred" is more than the VM total RAM, so I guess we have entered the part where we transfer delta for RAM that was written while the first part was running ?
2022-03-03 18:08:48 xbzrle: send updates to 9810 pages in 7.2 MiB encoded memory, cache-miss 99.59%, overflow 1300
We can find some information in the xbzrle documentation :
Instead of sending the changed guest memory page this solution will send a
compressed version of the updates, thus reducing the amount of data sent during
live migration.
In order to be able to calculate the update, the previous memory pages need to
be stored on the source. Those pages are stored in a dedicated cache
(hash table) and are accessed by their address.
The larger the cache size the better the chances are that the page has already
been stored in the cache.
A small cache size will result in high cache miss rate.
Cache size can be changed before and during migration.
[...]
xbzrle cache miss: the number of cache misses to date - high cache-miss rate
indicates that the cache size is set too low.
xbzrle overflow: the number of overflows in the decoding which where the delta
could not be compressed. This can happen if the changes in the pages are too
large or there are many short changes; for example, changing every second byte
(half a page).
cache-miss 99.59%
means xbzrle cache is useless here. Is there a way to make Proxmox use a larger xbzrle cache, or to tune other xbzrle settings ?2022-03-03 18:16:53 migration active, transferred 57.2 GiB of 32.0 GiB VM-state, 39.9 MiB/s, VM dirties lots of memory: 46.4 MiB/s
Here we see that the VM dirties memory faster than the network transfer speed, so the migration might never end !
In the end, we fixed this by stopping the app on the VM, so that the remaining dirty pages could be transferred.
After this experience, we have set migrations to use a faster network, and insecure mode.
You can choose the migration network in the web GUI, in Datacenter > Options > Migration settings.
You can set insecure mode by editing /etc/pve/datacenter.cfg in a cluster node.
Last question : is there a way to alter the threshold for the latest delta copy ? (when the dirty memory on the vm is below this threshold, the vm will be freezed, one last transfer will be done, and the vm will awaken on the destination server)
Do you think of any other settings to be known regarding live migrations ?
Thanks for reading and hopefully contributing to this thread