I get the following error if I try to migrate vm to another node while it is running. It successfully migrates if done while VM is shutdown.
I would love to know how can this be troubleshooted and fixed. I tried to ssh from one node to the other by hostname srv1/2 which are resolvable by ip in resolv.conf and by node1/2 which are put in the hostfile. What else can I provide or check in order to fix this?
Code:
Proxmox
Virtual Environment 6.2-15
Virtual Machine 106 (piHole) on node 'srv1'
Logs
()
2020-11-05 01:23:15 starting migration of VM 106 to node 'srv2' (192.168.57.61)
2020-11-05 01:23:15 found local, replicated disk 'disks:vm-106-disk-0' (in current VM config)
2020-11-05 01:23:15 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2020-11-05 01:23:15 replicating disk images
2020-11-05 01:23:15 start replication job
2020-11-05 01:23:15 guest => VM 106, running => 3269
2020-11-05 01:23:15 volumes => disks:vm-106-disk-0
2020-11-05 01:23:16 freeze guest filesystem
2020-11-05 01:23:16 create snapshot '__replicate_106-0_1604535795__' on disks:vm-106-disk-0
2020-11-05 01:23:16 thaw guest filesystem
2020-11-05 01:23:16 using secure transmission, rate limit: none
2020-11-05 01:23:16 incremental sync 'disks:vm-106-disk-0' (__replicate_106-0_1604532743__ => __replicate_106-0_1604535795__)
2020-11-05 01:23:16 send from @__replicate_106-0_1604532743__ to rea_daten/vm-106-disk-0@__replicate_106-0_1604535795__ estimated size is 20.5M
2020-11-05 01:23:16 total estimated size is 20.5M
2020-11-05 01:23:16 TIME SENT SNAPSHOT rea_daten/vm-106-disk-0@__replicate_106-0_1604535795__
2020-11-05 01:23:16 rea_daten/vm-106-disk-0@__replicate_106-0_1604532743__ name rea_daten/vm-106-disk-0@__replicate_106-0_1604532743__ -
2020-11-05 01:23:17 01:23:17 14.2M rea_daten/vm-106-disk-0@__replicate_106-0_1604535795__
2020-11-05 01:23:18 successfully imported 'disks:vm-106-disk-0'
2020-11-05 01:23:18 delete previous replication snapshot '__replicate_106-0_1604532743__' on disks:vm-106-disk-0
2020-11-05 01:23:18 (remote_finalize_local_job) delete stale replication snapshot '__replicate_106-0_1604532743__' on disks:vm-106-disk-0
2020-11-05 01:23:19 end replication job
2020-11-05 01:23:19 copying local disk images
2020-11-05 01:23:19 starting VM 106 on remote node 'srv2'
2020-11-05 01:23:19 start remote tunnel
2020-11-05 01:23:20 ssh tunnel ver 1
2020-11-05 01:23:20 starting storage migration
2020-11-05 01:23:20 scsi0: start migration to nbd:unix:/run/qemu-server/106_nbd.migrate:exportname=drive-scsi0
drive mirror re-using dirty bitmap 'repl_scsi0'
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 1114112 bytes total: 1114112 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi0: transferred: 1114112 bytes remaining: 0 bytes total: 1114112 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2020-11-05 01:23:21 volume 'disks:vm-106-disk-0' is 'disks:vm-106-disk-0' on the target
2020-11-05 01:23:21 starting online/live migration on unix:/run/qemu-server/106.migrate
2020-11-05 01:23:21 set migration_caps
2020-11-05 01:23:21 migration speed limit: 8589934592 B/s
2020-11-05 01:23:21 migration downtime limit: 100 ms
2020-11-05 01:23:21 migration cachesize: 134217728 B
2020-11-05 01:23:21 set migration parameters
2020-11-05 01:23:21 start migrate command to unix:/run/qemu-server/106.migrate
channel 4: open failed: connect failed: open failed
channel 3: open failed: connect failed: open failed
2020-11-05 01:23:22 migration status error: failed
2020-11-05 01:23:22 ERROR: online migrate failure - aborting
2020-11-05 01:23:22 aborting phase 2 - cleanup resources
2020-11-05 01:23:22 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2020-11-05 01:23:22 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2020-11-05 01:23:24 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems
I would love to know how can this be troubleshooted and fixed. I tried to ssh from one node to the other by hostname srv1/2 which are resolvable by ip in resolv.conf and by node1/2 which are put in the hostfile. What else can I provide or check in order to fix this?
Last edited: