Problem w/ hot migration

kurdam

Active Member
Sep 29, 2020
45
3
28
33
Hi,
I'm working in a datacenter and actually migrating all of my network.
I need to migrate all of my VMs from one node to the other. Unfortunately, i'm experiencing some errors with hot migrations.
I am getting a fail migration status and just before failing, the cachemiss increase:


2021-01-14 11:22:00 use dedicated network address for sending migration traffic (10.10.1.6)
2021-01-14 11:22:00 starting migration of VM 1008 to node 'pve2' (10.10.1.6)
2021-01-14 11:22:01 starting VM 1008 on remote node 'pve2'
2021-01-14 11:22:05 start remote tunnel
2021-01-14 11:22:06 ssh tunnel ver 1
2021-01-14 11:22:06 starting online/live migration on unix:/run/qemu-server/1008.migrate
2021-01-14 11:22:06 set migration_caps
2021-01-14 11:22:06 migration speed limit: 8589934592 B/s
2021-01-14 11:22:06 migration downtime limit: 100 ms
2021-01-14 11:22:06 migration cachesize: 1073741824 B
2021-01-14 11:22:06 set migration parameters
2021-01-14 11:22:06 start migrate command to unix:/run/qemu-server/1008.migrate
2021-01-14 11:22:07 migration status: active (transferred 116111868, remaining 6286077952), total 6460350464)
2021-01-14 11:22:07 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:08 migration status: active (transferred 233171401, remaining 6094254080), total 6460350464)
2021-01-14 11:22:08 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:09 migration status: active (transferred 350353839, remaining 5906128896), total 6460350464)
2021-01-14 11:22:09 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:10 migration status: active (transferred 467451569, remaining 5730463744), total 6460350464)
2021-01-14 11:22:10 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:11 migration status: active (transferred 584367030, remaining 5565071360), total 6460350464)
2021-01-14 11:22:11 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:12 migration status: active (transferred 701642897, remaining 5403381760), total 6460350464)
2021-01-14 11:22:12 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:13 migration status: active (transferred 818826325, remaining 5246488576), total 6460350464)
2021-01-14 11:22:13 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:14 migration status: active (transferred 935955941, remaining 5102907392), total 6460350464)
2021-01-14 11:22:14 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:15 migration status: active (transferred 1052964742, remaining 4956532736), total 6460350464)
2021-01-14 11:22:15 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:16 migration status: active (transferred 1170139872, remaining 4803416064), total 6460350464)
2021-01-14 11:22:16 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:17 migration status: active (transferred 1287298640, remaining 4657745920), total 6460350464)
2021-01-14 11:22:17 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:18 migration status: active (transferred 1404524781, remaining 4490735616), total 6460350464)
2021-01-14 11:22:18 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:19 migration status: active (transferred 1521682496, remaining 4332498944), total 6460350464)
2021-01-14 11:22:19 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:20 migration status: active (transferred 1638911383, remaining 4192190464), total 6460350464)
2021-01-14 11:22:20 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:21 migration status: active (transferred 1755911310, remaining 4061036544), total 6460350464)
2021-01-14 11:22:21 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:22 migration status: active (transferred 1873227523, remaining 3921989632), total 6460350464)
2021-01-14 11:22:22 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:23 migration status: active (transferred 1990284096, remaining 3779964928), total 6460350464)
2021-01-14 11:22:23 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:24 migration status: active (transferred 2107527806, remaining 3647819776), total 6460350464)
2021-01-14 11:22:24 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:25 migration status: active (transferred 2224451989, remaining 3525046272), total 6460350464)
2021-01-14 11:22:25 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:26 migration status: active (transferred 2341519145, remaining 3370754048), total 6460350464)
2021-01-14 11:22:26 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:27 migration status: active (transferred 2458773142, remaining 3235790848), total 6460350464)
2021-01-14 11:22:27 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:28 migration status: active (transferred 2576057586, remaining 3109335040), total 6460350464)
2021-01-14 11:22:28 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:29 migration status: active (transferred 2692901938, remaining 2966986752), total 6460350464)
2021-01-14 11:22:29 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:30 migration status: active (transferred 2810091225, remaining 2796244992), total 6460350464)
2021-01-14 11:22:30 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:31 migration status: active (transferred 2927286452, remaining 2619072512), total 6460350464)
2021-01-14 11:22:31 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:32 migration status: active (transferred 3044200023, remaining 2437767168), total 6460350464)
2021-01-14 11:22:32 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:33 migration status: active (transferred 3161393477, remaining 2253946880), total 6460350464)
2021-01-14 11:22:33 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:34 migration status: active (transferred 3278771233, remaining 2079432704), total 6460350464)
2021-01-14 11:22:34 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:35 migration status: active (transferred 3395749694, remaining 1931960320), total 6460350464)
2021-01-14 11:22:35 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:36 migration status: active (transferred 3512913286, remaining 1772912640), total 6460350464)
2021-01-14 11:22:36 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:37 migration status: active (transferred 3629866908, remaining 1580830720), total 6460350464)
2021-01-14 11:22:37 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:38 migration status: active (transferred 3747027817, remaining 1382006784), total 6460350464)
2021-01-14 11:22:38 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:39 migration status: active (transferred 3864229659, remaining 1199960064), total 6460350464)
2021-01-14 11:22:39 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:40 migration status: active (transferred 3981458762, remaining 1022279680), total 6460350464)
2021-01-14 11:22:40 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:41 migration status: active (transferred 4098527288, remaining 850587648), total 6460350464)
2021-01-14 11:22:41 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:45 migration status: active (transferred 4538714433, remaining 316334080), total 6460350464)
2021-01-14 11:22:45 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:46 migration status: active (transferred 4655931368, remaining 162828288), total 6460350464)
2021-01-14 11:22:46 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2021-01-14 11:22:47 migration status: active (transferred 4771533994, remaining 22335488), total 6460350464)
2021-01-14 11:22:47 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 1582 overflow 0
2021-01-14 11:22:47 migration status: active (transferred 4783437239, remaining 9715712), total 6460350464)
2021-01-14 11:22:47 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 4482 overflow 0
2021-01-14 11:22:47 migration status error: failed
2021-01-14 11:22:47 ERROR: online migrate failure - aborting
2021-01-14 11:22:47 aborting phase 2 - cleanup resources
2021-01-14 11:22:47 migrate_cancel
2021-01-14 11:22:49 ERROR: migration finished with problems (duration 00:00:49)
TASK ERROR: migration problems


I've never had any problem with hot migration before and the error is not very explicit.
The storage for these machines are located on an iscsi network share on wich i mounted an LVM partition over the LUN.
All my nodes are on the latest version, my nodes all ping the iscsi share and the LVM that is on it, and all my nodes can ping themselves on the dedicated migration network.
And finally, my ISCSI access is shared between all my nodes (i clicked the share option in datacenter -> storage)

I didn't installed yet or configured the ISCSI multipath and it's the first thing i intend to do after my network migration. For the time being, I have only one link on each node going to my network storage.

I suspect maybe it could be caused by jumbo frames wich are configured on some switchs and not on others. But i don't have any more clues...
 
Check the syslog ('journalctl') for any errors during migration (on both source and target!). Also post your VM config and 'pveversion -v' of both nodes if possible.

I think the cachemiss is unrelated to any issues you have. Is the VM under load during the migration?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!