[SOLVED] Migration VM problem

svirus

Member
Jul 3, 2018
18
0
21
38
Hello,
I noticed some migration problem on brend new cluster instalation with ceph.
When I set HA migrate to other node, then sometimes it can't start, and going to "pasue" state.
Then I can click resume and everything start.
In log I found this error:

task started by HA resource agent
2018-12-18 15:24:19 starting migration of VM 100 to node 'pve2' (1.1.1.2)
2018-12-18 15:24:19 copying disk images
2018-12-18 15:24:19 starting VM 100 on remote node 'pve2'
2018-12-18 15:24:21 start remote tunnel
2018-12-18 15:24:21 ssh tunnel ver 1
2018-12-18 15:24:21 starting online/live migration on unix:/run/qemu-server/100.migrate
2018-12-18 15:24:21 migrate_set_speed: 8589934592
2018-12-18 15:24:21 migrate_set_downtime: 0.1
2018-12-18 15:24:21 set migration_caps
2018-12-18 15:24:21 set cachesize: 268435456
2018-12-18 15:24:21 start migrate command to unix:/run/qemu-server/100.migrate
2018-12-18 15:24:22 migration status: active (transferred 119329235, remaining 1926217728), total 2165121024)
2018-12-18 15:24:26 migration status: active (transferred 590721034, remaining 1104625664), total 2165121024)
2018-12-18 15:24:26 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-12-18 15:24:27 migration speed: 341.33 MB/s - downtime 50 ms
2018-12-18 15:24:27 migration status: completed
2018-12-18 15:24:27 ERROR: tunnel replied 'ERR: resume failed - unable to find configuration file for VM 100 - no such machine' to command 'resume 100'
2018-12-18 15:24:30 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems

This is strange couse this happend random. Sometimes works, sometimes not... with same node and same VM.
 
Hi,

please check your network.
This looks like a lock problem.
 
With network everything is ok... I have running everything on Ceph cluster wiothout any errors.
The strange is this... I have only error when server is in HA mode.
When I remove server from HA and I make live migration without any problems.

I have cluster made in two datacenter with local network (latency about 20-25 ms), but livemigration don't work between servers in same location (latency less than 1ms).
 
More strange thing... I rebooted all my Proxmox servers in cluster, and it's works :p
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!