Migrate VM problem: SSH error?

devawpz

Member
Sep 21, 2020
30
0
6
Coming from this thread, for information purposes.

In order to get this done, I migrated the VM to another node, to avoid the downtime.

Migrating back, I got this error:


Code:
2020-10-06 19:32:44 ERROR: online migrate failure - aborting

2020-10-06 19:32:44 aborting phase 2 - cleanup resources

2020-10-06 19:32:44 migrate_cancel

drive-scsi0: Cancelling block job

channel 4: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

drive-scsi0: Done.

2020-10-06 19:32:49 ERROR: migration finished with problems (duration 00:05:06)

TASK ERROR: migration problems


I've tried putting AllowTcpForwarding yes , to handle local and remote port forwarding, but with no success.

Is my cluster screwed up?

Would be happy if I can get any help.
 
Trying to post some logs, maybe this can help. Part 1/3:


Code:
2020-10-11 01:56:55 starting migration of VM 105 to node 'alpha' (110.225.10.131)
2020-10-11 01:56:55 found local disk 'local-raid:105/vm-105-disk-0.qcow2' (in current VM config)
2020-10-11 01:56:55 copying local disk images
2020-10-11 01:56:55 starting VM 105 on remote node 'alpha'
2020-10-11 01:57:00 start remote tunnel
2020-10-11 01:57:01 ssh tunnel ver 1
2020-10-11 01:57:01 starting storage migration
2020-10-11 01:57:01 scsi0: start migration to nbd:unix:/run/qemu-server/105_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 127926272 bytes remaining: 68591550464 bytes total: 68719476736 bytes progression: 0.19 % busy: 1 ready: 0
drive-scsi0: transferred: 242221056 bytes remaining: 68477255680 bytes total: 68719476736 bytes progression: 0.35 % busy: 1 ready: 0
drive-scsi0: transferred: 357564416 bytes remaining: 68361912320 bytes total: 68719476736 bytes progression: 0.52 % busy: 1 ready: 0
drive-scsi0: transferred: 479199232 bytes remaining: 68240277504 bytes total: 68719476736 bytes progression: 0.70 % busy: 1 ready: 0
drive-scsi0: transferred: 598736896 bytes remaining: 68120739840 bytes total: 68719476736 bytes progression: 0.87 % busy: 1 ready: 0
drive-scsi0: transferred: 719323136 bytes remaining: 68000153600 bytes total: 68719476736 bytes progression: 1.05 % busy: 1 ready: 0
drive-scsi0: transferred: 839909376 bytes remaining: 67879567360 bytes total: 68719476736 bytes progression: 1.22 % busy: 1 ready: 0
drive-scsi0: transferred: 962592768 bytes remaining: 67756883968 bytes total: 68719476736 bytes progression: 1.40 % busy: 1 ready: 0
drive-scsi0: transferred: 1081081856 bytes remaining: 67638394880 bytes total: 68719476736 bytes progression: 1.57 % busy: 1 ready: 0
drive-scsi0: transferred: 1206910976 bytes remaining: 67512565760 bytes total: 68719476736 bytes progression: 1.76 % busy: 1 ready: 0
drive-scsi0: transferred: 1328545792 bytes remaining: 67390930944 bytes total: 68719476736 bytes progression: 1.93 % busy: 1 ready: 0
drive-scsi0: transferred: 1454374912 bytes remaining: 67265101824 bytes total: 68719476736 bytes progression: 2.12 % busy: 1 ready: 0
drive-scsi0: transferred: 2320498688 bytes remaining: 66398978048 bytes total: 68719476736 bytes progression: 3.38 % busy: 1 ready: 0
drive-scsi0: transferred: 2443182080 bytes remaining: 66276294656 bytes total: 68719476736 bytes progression: 3.56 % busy: 1 ready: 0
drive-scsi0: transferred: 2979004416 bytes remaining: 65740472320 bytes total: 68719476736 bytes progression: 4.34 % busy: 1 ready: 0
drive-scsi0: transferred: 3096444928 bytes remaining: 65623031808 bytes total: 68719476736 bytes progression: 4.51 % busy: 1 ready: 0
drive-scsi0: transferred: 3215982592 bytes remaining: 65503494144 bytes total: 68719476736 bytes progression: 4.68 % busy: 1 ready: 0
drive-scsi0: transferred: 3335520256 bytes remaining: 65383956480 bytes total: 68719476736 bytes progression: 4.85 % busy: 1 ready: 0
drive-scsi0: transferred: 3463446528 bytes remaining: 65256030208 bytes total: 68719476736 bytes progression: 5.04 % busy: 1 ready: 0
drive-scsi0: transferred: 3580887040 bytes remaining: 65138589696 bytes total: 68719476736 bytes progression: 5.21 % busy: 1 ready: 0
drive-scsi0: transferred: 3708813312 bytes remaining: 65010663424 bytes total: 68719476736 bytes progression: 5.40 % busy: 1 ready: 0
drive-scsi0: transferred: 3827302400 bytes remaining: 64892174336 bytes total: 68719476736 bytes progression: 5.57 % busy: 1 ready: 0
drive-scsi0: transferred: 3943694336 bytes remaining: 64775782400 bytes total: 68719476736 bytes progression: 5.74 % busy: 1 ready: 0
drive-scsi0: transferred: 4502585344 bytes remaining: 64216891392 bytes total: 68719476736 bytes progression: 6.55 % busy: 1 ready: 0
drive-scsi0: transferred: 4623171584 bytes remaining: 64096305152 bytes total: 68719476736 bytes progression: 6.73 % busy: 1 ready: 0
drive-scsi0: transferred: 6680477696 bytes remaining: 62038999040 bytes total: 68719476736 bytes progression: 9.72 % busy: 1 ready: 0
drive-scsi0: transferred: 6793723904 bytes remaining: 61925752832 bytes total: 68719476736 bytes progression: 9.89 % busy: 1 ready: 0
drive-scsi0: transferred: 10737418240 bytes remaining: 57982058496 bytes total: 68719476736 bytes progression: 15.62 % busy: 1 ready: 0

<cut>
 
Part 2/3:

Code:
<cut>


drive-scsi0: transferred: 23904387072 bytes remaining: 44816859136 bytes total: 68721246208 bytes progression: 34.78 % busy: 1 ready: 0
drive-scsi0: transferred: 24015536128 bytes remaining: 44705710080 bytes total: 68721246208 bytes progression: 34.95 % busy: 1 ready: 0
drive-scsi0: transferred: 24135073792 bytes remaining: 44586172416 bytes total: 68721246208 bytes progression: 35.12 % busy: 1 ready: 0
drive-scsi0: transferred: 24254611456 bytes remaining: 44466634752 bytes total: 68721246208 bytes progression: 35.29 % busy: 1 ready: 0
drive-scsi0: transferred: 24373100544 bytes remaining: 44348145664 bytes total: 68721246208 bytes progression: 35.47 % busy: 1 ready: 0
drive-scsi0: transferred: 24494735360 bytes remaining: 44226510848 bytes total: 68721246208 bytes progression: 35.64 % busy: 1 ready: 0
drive-scsi0: transferred: 24623710208 bytes remaining: 44097536000 bytes total: 68721246208 bytes progression: 35.83 % busy: 1 ready: 0
drive-scsi0: transferred: 24742199296 bytes remaining: 43979046912 bytes total: 68721246208 bytes progression: 36.00 % busy: 1 ready: 0
drive-scsi0: transferred: 24861736960 bytes remaining: 43859509248 bytes total: 68721246208 bytes progression: 36.18 % busy: 1 ready: 0
drive-scsi0: transferred: 24980226048 bytes remaining: 43741020160 bytes total: 68721246208 bytes progression: 36.35 % busy: 1 ready: 0
drive-scsi0: transferred: 25099763712 bytes remaining: 43621482496 bytes total: 68721246208 bytes progression: 36.52 % busy: 1 ready: 0
drive-scsi0: transferred: 25219301376 bytes remaining: 43501944832 bytes total: 68721246208 bytes progression: 36.70 % busy: 1 ready: 0
drive-scsi0: transferred: 26305626112 bytes remaining: 42415620096 bytes total: 68721246208 bytes progression: 38.28 % busy: 1 ready: 0
drive-scsi0: transferred: 51522830336 bytes remaining: 17198415872 bytes total: 68721246208 bytes progression: 74.97 % busy: 1 ready: 0
drive-scsi0: transferred: 68721246208 bytes remaining: 0 bytes total: 68721246208 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2020-10-11 01:59:25 volume 'local-raid:105/vm-105-disk-0.qcow2' is 'local-raid:105/vm-105-disk-0.qcow2' on the target
2020-10-11 01:59:25 starting online/live migration on unix:/run/qemu-server/105.migrate
2020-10-11 01:59:25 set migration_caps
2020-10-11 01:59:25 migration speed limit: 8589934592 B/s
2020-10-11 01:59:25 migration downtime limit: 100 ms
2020-10-11 01:59:25 migration cachesize: 2147483648 B
2020-10-11 01:59:25 set migration parameters
2020-10-11 01:59:25 start migrate command to unix:/run/qemu-server/105.migrate
2020-10-11 01:59:26 migration status: active (transferred 103873617, remaining 17052528640), total 17197768704)
2020-10-11 01:59:26 migration xbzrle cachesize: 2147483648 transferred 41270374 pages 62487 cachemiss 372065 overflow 327
2020-10-11 01:59:27 migration status: active (transferred 221913800, remaining 16930635776), total 17197768704)

<cut>
 
Part 3/3:


Code:
<cut>

2020-10-11 02:01:48 migration xbzrle cachesize: 2147483648 transferred 41270374 pages 62487 cachemiss 415180 overflow 327
2020-10-11 02:01:48 migration status: active (transferred 16740636437, remaining 20156416), total 17197768704)
2020-10-11 02:01:48 migration xbzrle cachesize: 2147483648 transferred 41270374 pages 62487 cachemiss 418205 overflow 327
2020-10-11 02:01:48 migration status: active (transferred 16753239864, remaining 28352512), total 17197768704)
2020-10-11 02:01:48 migration xbzrle cachesize: 2147483648 transferred 41270374 pages 62487 cachemiss 421276 overflow 327
2020-10-11 02:01:48 migration status error: failed
2020-10-11 02:01:48 ERROR: online migrate failure - aborting
2020-10-11 02:01:48 aborting phase 2 - cleanup resources
2020-10-11 02:01:48 migrate_cancel
drive-scsi0: Cancelling block job
channel 4: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

drive-scsi0: Done.
2020-10-11 02:02:02 ERROR: migration finished with problems (duration 00:05:08)
TASK ERROR: migration problems
 
Last edited:
Still trying to get some indication for this, if someone can help it would really be helping me in my progress, I'm really stuck right now.

I think (my opinion) that the most important part is the third, where the migration fails, as this doesn't occur when I try to migrate other VMs. As in Part 3 above, this last line before the error:

Code:
2020-10-11 02:01:48 migration xbzrle cachesize: 2147483648 transferred 41270374 pages 62487 cachemiss 421276 overflow 327


On other VMs, when I migrate successfully, it appears like this:


Code:
2020-10-11 08:12:10 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 1976 overflow 0

Notice how the "transferred, pages, overflow" all show zero, unlike the failed migration.

The rest of the lines show a successful status (on the other VMs):

Code:
2020-10-11 08:12:10 migration speed: 28.44 MB/s - downtime 73 ms
2020-10-11 08:12:0 migration status: completed

Could this be some indication as to the cause of the failed migration, later leading to the connection error that would then be a consequence of the migration itself having been unsuccessful?

Would appreciate any help.
 
I have the exact same issue

I noticed you can only offline migrate the VM anymore. Online migration to any node is than broken.