Hello,
I noticed some migration problem on brend new cluster instalation with ceph.
When I set HA migrate to other node, then sometimes it can't start, and going to "pasue" state.
Then I can click resume and everything start.
In log I found this error:
task started by HA resource agent
2018-12-18 15:24:19 starting migration of VM 100 to node 'pve2' (1.1.1.2)
2018-12-18 15:24:19 copying disk images
2018-12-18 15:24:19 starting VM 100 on remote node 'pve2'
2018-12-18 15:24:21 start remote tunnel
2018-12-18 15:24:21 ssh tunnel ver 1
2018-12-18 15:24:21 starting online/live migration on unix:/run/qemu-server/100.migrate
2018-12-18 15:24:21 migrate_set_speed: 8589934592
2018-12-18 15:24:21 migrate_set_downtime: 0.1
2018-12-18 15:24:21 set migration_caps
2018-12-18 15:24:21 set cachesize: 268435456
2018-12-18 15:24:21 start migrate command to unix:/run/qemu-server/100.migrate
2018-12-18 15:24:22 migration status: active (transferred 119329235, remaining 1926217728), total 2165121024)
2018-12-18 15:24:26 migration status: active (transferred 590721034, remaining 1104625664), total 2165121024)
2018-12-18 15:24:26 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-12-18 15:24:27 migration speed: 341.33 MB/s - downtime 50 ms
2018-12-18 15:24:27 migration status: completed
2018-12-18 15:24:27 ERROR: tunnel replied 'ERR: resume failed - unable to find configuration file for VM 100 - no such machine' to command 'resume 100'
2018-12-18 15:24:30 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems
This is strange couse this happend random. Sometimes works, sometimes not... with same node and same VM.
I noticed some migration problem on brend new cluster instalation with ceph.
When I set HA migrate to other node, then sometimes it can't start, and going to "pasue" state.
Then I can click resume and everything start.
In log I found this error:
task started by HA resource agent
2018-12-18 15:24:19 starting migration of VM 100 to node 'pve2' (1.1.1.2)
2018-12-18 15:24:19 copying disk images
2018-12-18 15:24:19 starting VM 100 on remote node 'pve2'
2018-12-18 15:24:21 start remote tunnel
2018-12-18 15:24:21 ssh tunnel ver 1
2018-12-18 15:24:21 starting online/live migration on unix:/run/qemu-server/100.migrate
2018-12-18 15:24:21 migrate_set_speed: 8589934592
2018-12-18 15:24:21 migrate_set_downtime: 0.1
2018-12-18 15:24:21 set migration_caps
2018-12-18 15:24:21 set cachesize: 268435456
2018-12-18 15:24:21 start migrate command to unix:/run/qemu-server/100.migrate
2018-12-18 15:24:22 migration status: active (transferred 119329235, remaining 1926217728), total 2165121024)
2018-12-18 15:24:26 migration status: active (transferred 590721034, remaining 1104625664), total 2165121024)
2018-12-18 15:24:26 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-12-18 15:24:27 migration speed: 341.33 MB/s - downtime 50 ms
2018-12-18 15:24:27 migration status: completed
2018-12-18 15:24:27 ERROR: tunnel replied 'ERR: resume failed - unable to find configuration file for VM 100 - no such machine' to command 'resume 100'
2018-12-18 15:24:30 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems
This is strange couse this happend random. Sometimes works, sometimes not... with same node and same VM.