Weird behavior on VM migration

Le PAH

Member
Oct 17, 2018
38
0
6
France
Hello,

I've got a cluster that is running smoothly but, in some cases, mass migration is not working properly.

The VM, once migrated on the destination node won't boot up with the following error:

ERROR: tunnel replied 'ERR: resume failed - unable to find configuration file for VM 108 - no such machine' to command 'resume 108'

It happened on 4 out of 8 VM that I've migrated with no explanation.

The migrated VM can be resumed afterwards, with no visible problem.

Can I I fix this or is this an underlying bug?

BR, PAH

Complete console output:

Code:
2019-01-16 11:13:41 use dedicated network address for sending migration traffic (10.0.0.101)
2019-01-16 11:13:41 starting migration of VM 108 to node 'srv-pve1' (10.0.0.101)
2019-01-16 11:13:41 copying disk images
2019-01-16 11:13:41 starting VM 108 on remote node 'srv-pve1'
2019-01-16 11:13:43 start remote tunnel
2019-01-16 11:13:44 ssh tunnel ver 1
2019-01-16 11:13:44 starting online/live migration on unix:/run/qemu-server/108.migrate
2019-01-16 11:13:44 migrate_set_speed: 8589934592
2019-01-16 11:13:44 migrate_set_downtime: 0.1
2019-01-16 11:13:44 set migration_caps
2019-01-16 11:13:44 set cachesize: 1073741824
2019-01-16 11:13:44 start migrate command to unix:/run/qemu-server/108.migrate
2019-01-16 11:13:45 migration status: active (transferred 38624326, remaining 5551058944), total 8607571968)
2019-01-16 11:13:45 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:46 migration status: active (transferred 86231499, remaining 4964671488), total 8607571968)
2019-01-16 11:13:46 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:47 migration status: active (transferred 133011447, remaining 4916887552), total 8607571968)
2019-01-16 11:13:47 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:48 migration status: active (transferred 179847735, remaining 4865826816), total 8607571968)
2019-01-16 11:13:48 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:49 migration status: active (transferred 226354767, remaining 4817383424), total 8607571968)
2019-01-16 11:13:49 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:50 migration status: active (transferred 273210495, remaining 4770521088), total 8607571968)
2019-01-16 11:13:50 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:51 migration status: active (transferred 320259273, remaining 4723392512), total 8607571968)
2019-01-16 11:13:51 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:52 migration status: active (transferred 416547896, remaining 4627099648), total 8607571968)
2019-01-16 11:13:52 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:53 migration status: active (transferred 533664112, remaining 4510113792), total 8607571968)
2019-01-16 11:13:53 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:54 migration status: active (transferred 650082459, remaining 4393910272), total 8607571968)
2019-01-16 11:13:54 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:55 migration status: active (transferred 763623893, remaining 4280582144), total 8607571968)
2019-01-16 11:13:55 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:56 migration status: active (transferred 880148998, remaining 4164247552), total 8607571968)
2019-01-16 11:13:56 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:57 migration status: active (transferred 997203536, remaining 4047380480), total 8607571968)
2019-01-16 11:13:57 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:58 migration status: active (transferred 1114106839, remaining 3930382336), total 8607571968)
2019-01-16 11:13:58 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:13:59 migration status: active (transferred 1231132713, remaining 3813511168), total 8607571968)
2019-01-16 11:13:59 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:00 migration status: active (transferred 1347165482, remaining 3695738880), total 8607571968)
2019-01-16 11:14:00 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:01 migration status: active (transferred 1463883637, remaining 3579138048), total 8607571968)
2019-01-16 11:14:01 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:02 migration status: active (transferred 1580819115, remaining 3462406144), total 8607571968)
2019-01-16 11:14:02 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:03 migration status: active (transferred 1697916017, remaining 3344891904), total 8607571968)
2019-01-16 11:14:03 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:04 migration status: active (transferred 1814845339, remaining 3227234304), total 8607571968)
2019-01-16 11:14:04 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:05 migration status: active (transferred 1930115645, remaining 3085598720), total 8607571968)
2019-01-16 11:14:05 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:06 migration status: active (transferred 2046899860, remaining 2922160128), total 8607571968)
2019-01-16 11:14:06 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:07 migration status: active (transferred 2163855327, remaining 2790739968), total 8607571968)
2019-01-16 11:14:07 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:08 migration status: active (transferred 2280813512, remaining 2661810176), total 8607571968)
2019-01-16 11:14:08 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:09 migration status: active (transferred 2397912358, remaining 2522910720), total 8607571968)
2019-01-16 11:14:09 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:10 migration status: active (transferred 2514859617, remaining 2382180352), total 8607571968)
2019-01-16 11:14:10 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:11 migration status: active (transferred 2631862271, remaining 2244194304), total 8607571968)
2019-01-16 11:14:11 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:12 migration status: active (transferred 2748808882, remaining 2086985728), total 8607571968)
2019-01-16 11:14:12 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:13 migration status: active (transferred 2865831326, remaining 1925087232), total 8607571968)
2019-01-16 11:14:13 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:14 migration status: active (transferred 2982777928, remaining 1801428992), total 8607571968)
2019-01-16 11:14:14 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:15 migration status: active (transferred 3099838371, remaining 1020264448), total 8607571968)
2019-01-16 11:14:15 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:16 migration status: active (transferred 3215800418, remaining 742719488), total 8607571968)
2019-01-16 11:14:16 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:17 migration status: active (transferred 3332683759, remaining 580759552), total 8607571968)
2019-01-16 11:14:17 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:18 migration status: active (transferred 3449841501, remaining 400146432), total 8607571968)
2019-01-16 11:14:18 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:19 migration status: active (transferred 3566628082, remaining 269180928), total 8607571968)
2019-01-16 11:14:19 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:20 migration status: active (transferred 3683588427, remaining 137404416), total 8607571968)
2019-01-16 11:14:20 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3798956204, remaining 22224896), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3810831037, remaining 183799808), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 1723 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3823086408, remaining 171196416), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 4709 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3835365763, remaining 156999680), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 7700 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3847281797, remaining 144150528), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 10603 overflow 0
2019-01-16 11:14:21 migration status: active (transferred 3859533072, remaining 131551232), total 8607571968)
2019-01-16 11:14:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 13588 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3871477447, remaining 118849536), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 16498 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3883377704, remaining 103862272), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 19396 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3895538007, remaining 91664384), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 22359 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3907665478, remaining 79499264), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 25314 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3919725728, remaining 47607808), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 28242 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3931882116, remaining 35328000), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 31204 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3943895548, remaining 22880256), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 34131 overflow 0
2019-01-16 11:14:22 migration status: active (transferred 3956075120, remaining 9367552), total 8607571968)
2019-01-16 11:14:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 37098 overflow 0
2019-01-16 11:14:22 migration speed: 215.58 MB/s - downtime 20 ms
2019-01-16 11:14:22 migration status: completed
2019-01-16 11:14:22 ERROR: tunnel replied 'ERR: resume failed - unable to find configuration file for VM 108 - no such machine' to command 'resume 108'
2019-01-16 11:14:26 ERROR: migration finished with problems (duration 00:00:46)
TASK ERROR: migration problems
 
hm - does your clusternetwork work well?
* check the journal for entries from corosync (journalctl -r -u corosync.service) and pmxcfs (journalctl -r -u pve-cluster.service)
* how long does it take to create an empty file in /etc/pve `time touch /etc/pve/testemptyfile`?
 
* check the journal for entries from corosync (journalctl -r -u corosync.service) and pmxcfs (journalctl -r -u pve-cluster.service)

Jan 16 11:15:18 srv-pve1 corosync[2098]: [TOTEM ] Retransmit List: 189b
Jan 16 11:15:18 srv-pve1 corosync[2098]: [TOTEM ] Retransmit List: 189b
Jan 16 11:15:18 srv-pve1 corosync[2098]: notice [TOTEM ] Retransmit List: 189b
Jan 16 11:15:18 srv-pve1 corosync[2098]: notice [TOTEM ] Retransmit List: 189b
Jan 16 11:15:16 srv-pve1 corosync[2098]: [TOTEM ] Automatically recovered ring 1
Jan 16 11:15:16 srv-pve1 corosync[2098]: notice [TOTEM ] Automatically recovered ring 1
Jan 16 11:15:15 srv-pve1 corosync[2098]: [TOTEM ] Marking ringid 1 interface 192.168.1.101 FAULTY
Jan 16 11:15:15 srv-pve1 corosync[2098]: error [TOTEM ] Marking ringid 1 interface 192.168.1.101 FAULTY

Jan 16 11:15:14 srv-pve1 corosync[2098]: [TOTEM ] Retransmit List: 1884
Jan 16 11:15:14 srv-pve1 corosync[2098]: notice [TOTEM ] Retransmit List: 1884
Jan 16 11:15:14 srv-pve1 corosync[2098]: [TOTEM ] Retransmit List: 1884
Jan 16 11:15:14 srv-pve1 corosync[2098]: notice [TOTEM ] Retransmit List: 1884
Jan 16 11:15:14 srv-pve1 corosync[2098]: [TOTEM ] Retransmit List: 1884 1888
Jan 16 11:15:14 srv-pve1 corosync[2098]: notice [TOTEM ] Retransmit List: 1884 1888


* how long does it take to create an empty file in /etc/pve `time touch /etc/pve/testemptyfile`?
Code:
root@srv-pve1:~# time touch /etc/pve/testemptyfile
real   0m0.003s
user   0m0.001s
sys   0m0.000s


Looks like I've got a faulty network interface. I'll have a look at it but it is an administration interface, not the private ring 0 interface that is used by the nodes to communicate...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!