Hello everyone!
I was wondering if someone could give me a couple tips on the configuration for the network used by Migration Settings, to migrate VMs from one node to the other. We have a cluster with 15 nodes which are configured with 2 networks:
Reason I am asking is that many times, when we try to migrate a VM from one node to another, it usually fails with something like:
Which I was thinking could be improved if there was a faster network transfer speed during migrations, to make the memory be moved quicker and with less cache-miss.
Is the configuration that I was thinking a good idea? Are there any possible issues that could come from this change?
I was wondering if someone could give me a couple tips on the configuration for the network used by Migration Settings, to migrate VMs from one node to the other. We have a cluster with 15 nodes which are configured with 2 networks:
- Ceph is configured to use a Bond made of 2 10G physical interfaces in balance-rr
- Management and VMs are instead using another (a Bridge pointing to a Bond, actually) of 2 1G physical interfaces, with the bond configured in active-backup
Reason I am asking is that many times, when we try to migrate a VM from one node to another, it usually fails with something like:
Code:
2026-03-04 16:29:27 xbzrle: send updates to 4310672 pages in 4.5 GiB encoded memory, cache-miss 27.43%, overflow 548054
2026-03-04 16:29:28 migration active, transferred 49.1 GiB of 32.0 GiB VM-state, 222.9 MiB/s
2026-03-04 16:29:28 xbzrle: send updates to 4369514 pages in 4.6 GiB encoded memory, cache-miss 24.91%, overflow 553321
2026-03-04 16:29:28 average migration speed: 73.0 MiB/s - downtime 188 ms
2026-03-04 16:29:28 migration status: completed
2026-03-04 16:29:28 ERROR: tunnel replied 'ERR: resume failed - VM 163 qmp command 'query-status' failed - client closed connection' to command 'resume 163'
VM quit/powerdown failed - terminating now with SIGTERM
2026-03-04 16:29:42 ERROR: migration finished with problems (duration 00:07:46)
TASK ERROR: migration problems
Which I was thinking could be improved if there was a faster network transfer speed during migrations, to make the memory be moved quicker and with less cache-miss.
Is the configuration that I was thinking a good idea? Are there any possible issues that could come from this change?
Last edited: