Hi there.
I'm having problems with online migration of VMs between two hosts in a cluster. Both have local storages (lvm thin).
If I try to start the migration via the GUI or without options from the command line, I consistently get the following error:
When I do the same via insecure migration, it works as intended:
SSH works from and to the two nodes without options. The sshds listen on the default port and a secondary one, that shouldn't be an issue. What's interesting is that apparently only the tunnel setup breaks, not the SSH communication itself. At least if I understand the logs correctly (compare the "scsi0: start migration to ..." different targets).
I'm a bit lost on what to try next. While the migration network is secure and isolated, I'd prefer to have a tunneled migration anyway. Any tips on how to get more detailed log data on the tunnel setup or what else it might be?
Thanks!
I'm having problems with online migration of VMs between two hosts in a cluster. Both have local storages (lvm thin).
If I try to start the migration via the GUI or without options from the command line, I consistently get the following error:
Bash:
2024-01-24 12:35:10 starting migration of VM xxx to node 'xxxxxxxx' (xxxxxxxxx)
2024-01-24 12:35:10 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:35:10 starting VM xxx on remote node 'xxxxxxxxx'
2024-01-24 12:35:14 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:35:14 start remote tunnel
2024-01-24 12:35:15 ssh tunnel ver 1
2024-01-24 12:35:15 starting storage migration
2024-01-24 12:35:15 scsi0: start migration to nbd:unix:/run/qemu-server/xxx_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
channel 3: open failed: connect failed: open failed
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2024-01-24 12:35:15 ERROR: online migrate failure - mirroring error: VM xxx qmp command 'drive-mirror' failed - Failed to read initial magic: Unexpected end-of-file before all data were read
2024-01-24 12:35:15 aborting phase 2 - cleanup resources
2024-01-24 12:35:15 migrate_cancel
2024-01-24 12:35:20 ERROR: migration finished with problems (duration 00:00:11)
TASK ERROR: migration problems
When I do the same via insecure migration, it works as intended:
Code:
2024-01-24 12:38:43 starting migration of VM xxx to node 'xxxxxxxxx' (xxxxxxx)
2024-01-24 12:38:43 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:38:43 starting VM xxx on remote node 'xxxxxxx'
2024-01-24 12:38:47 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:38:47 start remote tunnel
2024-01-24 12:38:48 ssh tunnel ver 1
2024-01-24 12:38:48 starting storage migration
2024-01-24 12:38:48 scsi0: start migration to nbd:xxxxxxxx:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0 with bandwidth limit: 80000 KB/s
drive-scsi0: transferred 257.0 MiB of 50.0 GiB (0.50%) in 1m 31s
drive-scsi0: transferred 416.0 MiB of 50.0 GiB (0.81%) in 1m 32s
drive-scsi0: transferred 620.0 MiB of 50.0 GiB (1.21%) in 1m 33s
...
SSH works from and to the two nodes without options. The sshds listen on the default port and a secondary one, that shouldn't be an issue. What's interesting is that apparently only the tunnel setup breaks, not the SSH communication itself. At least if I understand the logs correctly (compare the "scsi0: start migration to ..." different targets).
I'm a bit lost on what to try next. While the migration network is secure and isolated, I'd prefer to have a tunneled migration anyway. Any tips on how to get more detailed log data on the tunnel setup or what else it might be?
Thanks!