A VM with two disks live migration failed.
It also died on source side.
Offline (as it was dead anyway) migration worked and VM recovered after.
I'm attaching the live migration log.
Should I report a bug or..?
It also died on source side.
Offline (as it was dead anyway) migration worked and VM recovered after.
I'm attaching the live migration log.
Should I report a bug or..?
Code:
Proxmox
Virtual Environment 6.2-12
Virtual Machine 142 (XYZ) on node 'p37'
Logs
()
2020-12-09 19:04:06 starting migration of VM 142 to node 'p37' (10.31.1.37)
2020-12-09 19:04:06 found local, replicated disk 'local-zfs:vm-142-disk-0' (in current VM config)
2020-12-09 19:04:06 found local, replicated disk 'local-zfs:vm-142-disk-1' (in current VM config)
2020-12-09 19:04:06 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2020-12-09 19:04:06 scsi1: start tracking writes using block-dirty-bitmap 'repl_scsi1'
2020-12-09 19:04:06 replicating disk images
2020-12-09 19:04:06 start replication job
2020-12-09 19:04:06 guest => VM 142, running => 9635
2020-12-09 19:04:06 volumes => local-zfs:vm-142-disk-0,local-zfs:vm-142-disk-1
2020-12-09 19:04:08 create snapshot '__replicate_142-0_1607537046__' on local-zfs:vm-142-disk-0
2020-12-09 19:04:08 create snapshot '__replicate_142-0_1607537046__' on local-zfs:vm-142-disk-1
2020-12-09 19:04:08 using secure transmission, rate limit: none
2020-12-09 19:04:08 incremental sync 'local-zfs:vm-142-disk-0' (__replicate_142-0_1607536986__ => __replicate_142-0_1607537046__)
2020-12-09 19:04:10 rpool/data/vm-142-disk-0@__replicate_142-0_1607536986__ name rpool/data/vm-142-disk-0@__replicate_142-0_1607536986__ -
2020-12-09 19:04:11 send from @__replicate_142-0_1607536986__ to rpool/data/vm-142-disk-0@__replicate_142-0_1607537046__ estimated size is 28.6M
2020-12-09 19:04:11 total estimated size is 28.6M
2020-12-09 19:04:11 TIME SENT SNAPSHOT rpool/data/vm-142-disk-0@__replicate_142-0_1607537046__
2020-12-09 19:04:11 successfully imported 'local-zfs:vm-142-disk-0'
2020-12-09 19:04:11 incremental sync 'local-zfs:vm-142-disk-1' (__replicate_142-0_1607536986__ => __replicate_142-0_1607537046__)
2020-12-09 19:04:13 rpool/data/vm-142-disk-1@__replicate_142-0_1607536986__ name rpool/data/vm-142-disk-1@__replicate_142-0_1607536986__ -
2020-12-09 19:04:14 send from @__replicate_142-0_1607536986__ to rpool/data/vm-142-disk-1@__replicate_142-0_1607537046__ estimated size is 4.14M
2020-12-09 19:04:14 total estimated size is 4.14M
2020-12-09 19:04:14 TIME SENT SNAPSHOT rpool/data/vm-142-disk-1@__replicate_142-0_1607537046__
2020-12-09 19:04:14 successfully imported 'local-zfs:vm-142-disk-1'
2020-12-09 19:04:14 delete previous replication snapshot '__replicate_142-0_1607536986__' on local-zfs:vm-142-disk-0
2020-12-09 19:04:14 delete previous replication snapshot '__replicate_142-0_1607536986__' on local-zfs:vm-142-disk-1
2020-12-09 19:04:15 (remote_finalize_local_job) delete stale replication snapshot '__replicate_142-0_1607536986__' on local-zfs:vm-142-disk-0
2020-12-09 19:04:15 (remote_finalize_local_job) delete stale replication snapshot '__replicate_142-0_1607536986__' on local-zfs:vm-142-disk-1
2020-12-09 19:04:15 end replication job
2020-12-09 19:04:16 copying local disk images
2020-12-09 19:04:16 starting VM 142 on remote node 'p37'
2020-12-09 19:04:19 start remote tunnel
2020-12-09 19:04:20 ssh tunnel ver 1
2020-12-09 19:04:20 starting storage migration
2020-12-09 19:04:20 scsi1: start migration to nbd:unix:/run/qemu-server/142_nbd.migrate:exportname=drive-scsi1
drive mirror re-using dirty bitmap 'repl_scsi1'
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 3145728 bytes total: 3145728 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 3211264 bytes remaining: 0 bytes total: 3211264 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2020-12-09 19:04:21 volume 'local-zfs:vm-142-disk-1' is 'local-zfs:vm-142-disk-1' on the target
2020-12-09 19:04:21 scsi0: start migration to nbd:unix:/run/qemu-server/142_nbd.migrate:exportname=drive-scsi0
drive mirror re-using dirty bitmap 'repl_scsi0'
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 524288 bytes remaining: 14745600 bytes total: 15269888 bytes progression: 3.43 % busy: 1 ready: 0
drive-scsi1: transferred: 3211264 bytes remaining: 0 bytes total: 3211264 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 16056320 bytes remaining: 0 bytes total: 16056320 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi1: transferred: 3211264 bytes remaining: 0 bytes total: 3211264 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2020-12-09 19:04:22 volume 'local-zfs:vm-142-disk-0' is 'local-zfs:vm-142-disk-0' on the target
2020-12-09 19:04:22 starting online/live migration on unix:/run/qemu-server/142.migrate
2020-12-09 19:04:22 set migration_caps
2020-12-09 19:04:22 migration speed limit: 8589934592 B/s
2020-12-09 19:04:22 migration downtime limit: 100 ms
2020-12-09 19:04:22 migration cachesize: 2147483648 B
2020-12-09 19:04:22 set migration parameters
2020-12-09 19:04:22 start migrate command to unix:/run/qemu-server/142.migrate
2020-12-09 19:04:23 migration status: active (transferred 123341959, remaining 14559850496), total 14697963520)
2020-12-09 19:04:23 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-12-09 19:04:24 migration status: active (transferred 339052571, remaining 14338920448), total 14697963520)
2020-12-09 19:04:24 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-12-09 19:04:25 migration status: active (transferred 514497418, remaining 14158819328), total 14697963520)
...
2020-12-09 19:05:52 migration status: active (transferred 15892510128, remaining 19333120), total 14697963520)
2020-12-09 19:05:52 migration xbzrle cachesize: 2147483648 transferred 138148789 pages 130000 cachemiss 385936 overflow 4091
2020-12-09 19:05:53 migration status: active (transferred 15901339942, remaining 29462528), total 14697963520)
2020-12-09 19:05:53 migration xbzrle cachesize: 2147483648 transferred 141392157 pages 138455 cachemiss 387292 overflow 4094
2020-12-09 19:05:53 migration status: active (transferred 15910220375, remaining 22675456), total 14697963520)
2020-12-09 19:05:53 migration xbzrle cachesize: 2147483648 transferred 146480920 pages 146102 cachemiss 388206 overflow 4104
2020-12-09 19:05:53 migration status: active (transferred 15917693927, remaining 4939776), total 14697963520)
2020-12-09 19:05:53 migration xbzrle cachesize: 2147483648 transferred 151067342 pages 154090 cachemiss 388898 overflow 4113
2020-12-09 19:05:53 migration status: active (transferred 15920411695, remaining 5074944), total 14697963520)
2020-12-09 19:05:53 migration xbzrle cachesize: 2147483648 transferred 151815218 pages 157656 cachemiss 389372 overflow 4115
query migrate failed: VM 142 not running
2020-12-09 19:05:53 query migrate failed: VM 142 not running
query migrate failed: VM 142 not running
2020-12-09 19:05:54 query migrate failed: VM 142 not running
query migrate failed: VM 142 not running
2020-12-09 19:05:55 query migrate failed: VM 142 not running
query migrate failed: VM 142 not running
2020-12-09 19:05:56 query migrate failed: VM 142 not running
query migrate failed: VM 142 not running
2020-12-09 19:05:57 query migrate failed: VM 142 not running
query migrate failed: VM 142 not running
2020-12-09 19:05:59 query migrate failed: VM 142 not running
2020-12-09 19:05:59 ERROR: online migrate failure - too many query migrate failures - aborting
2020-12-09 19:05:59 aborting phase 2 - cleanup resources
2020-12-09 19:05:59 migrate_cancel
2020-12-09 19:05:59 migrate_cancel error: VM 142 not running
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
2020-12-09 19:05:59 ERROR: VM 142 not running
2020-12-09 19:05:59 scsi1: removing block-dirty-bitmap 'repl_scsi1'
2020-12-09 19:05:59 ERROR: VM 142 not running
2020-12-09 19:06:01 ERROR: migration finished with problems (duration 00:01:55)
TASK ERROR: migration problems