Hi all.
I am experiencing an issue that occurs at almost every attempt when using live migration of a virtual machine with high memory allocation (32GB).
The live migration starts, then suddenly the VM is turned off and restarts.
We are using a 3-node HA Proxmox cluster with Ceph storage.
Here is the task log output:
Is there anything I can do to get live migration working more reliably than this? At the moment I am better of powering off the virtual machine and turning it back on on a different host.
Kind regards
I am experiencing an issue that occurs at almost every attempt when using live migration of a virtual machine with high memory allocation (32GB).
The live migration starts, then suddenly the VM is turned off and restarts.
We are using a 3-node HA Proxmox cluster with Ceph storage.
Here is the task log output:
Code:
task started by HA resource agent
2020-05-07 06:48:40 starting migration of VM 102 to node 'clusternode1' (10.81.250.100)
2020-05-07 06:48:40 starting VM 102 on remote node 'clusternode1'
2020-05-07 06:48:42 start remote tunnel
2020-05-07 06:48:43 ssh tunnel ver 1
2020-05-07 06:48:43 starting online/live migration on unix:/run/qemu-server/102.migrate
2020-05-07 06:48:43 set migration_caps
2020-05-07 06:48:43 migration speed limit: 8589934592 B/s
2020-05-07 06:48:43 migration downtime limit: 100 ms
2020-05-07 06:48:43 migration cachesize: 4294967296 B
2020-05-07 06:48:43 set migration parameters
2020-05-07 06:48:43 start migrate command to unix:/run/qemu-server/102.migrate
2020-05-07 06:48:44 migration status: active (transferred 28051947, remaining 34337816576), total 34377441280)
2020-05-07 06:48:44 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-07 06:48:45 migration status: active (transferred 57369637, remaining 34305478656), total 34377441280)
2020-05-07 06:48:45 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-07 06:48:46 migration status: active (transferred 86952918, remaining 34275262464), total 34377441280)
...
2020-05-07 06:55:41 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-07 06:55:42 migration status: active (transferred 31143197399, remaining 354840576), total 34377441280)
2020-05-07 06:55:42 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-07 06:55:43 migration status: active (transferred 31260030231, remaining 238235648), total 34377441280)
2020-05-07 06:55:43 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-07 06:55:44 migration status: active (transferred 31377121607, remaining 121372672), total 34377441280)
2020-05-07 06:55:44 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
query migrate failed: VM 102 not running
2020-05-07 06:55:45 query migrate failed: VM 102 not running
query migrate failed: VM 102 not running
2020-05-07 06:55:47 query migrate failed: VM 102 not running
query migrate failed: VM 102 not running
2020-05-07 06:55:49 query migrate failed: VM 102 not running
query migrate failed: VM 102 not running
2020-05-07 06:55:51 query migrate failed: VM 102 not running
query migrate failed: VM 102 not running
2020-05-07 06:55:53 query migrate failed: VM 102 not running
query migrate failed: VM 102 not running
2020-05-07 06:55:55 query migrate failed: VM 102 not running
2020-05-07 06:55:55 ERROR: online migrate failure - too many query migrate failures - aborting
2020-05-07 06:55:55 aborting phase 2 - cleanup resources
2020-05-07 06:55:55 migrate_cancel
2020-05-07 06:55:55 migrate_cancel error: VM 102 not running
2020-05-07 06:55:57 ERROR: migration finished with problems (duration 00:07:18)
TASK ERROR: migration problems
Is there anything I can do to get live migration working more reliably than this? At the moment I am better of powering off the virtual machine and turning it back on on a different host.
Kind regards