Online Migration local -> local only insecure

andreas.hammer · Jan 24, 2024

Hi there.

I'm having problems with online migration of VMs between two hosts in a cluster. Both have local storages (lvm thin).

If I try to start the migration via the GUI or without options from the command line, I consistently get the following error:

Bash:

2024-01-24 12:35:10 starting migration of VM xxx to node 'xxxxxxxx' (xxxxxxxxx)
2024-01-24 12:35:10 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:35:10 starting VM xxx on remote node 'xxxxxxxxx'
2024-01-24 12:35:14 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:35:14 start remote tunnel
2024-01-24 12:35:15 ssh tunnel ver 1
2024-01-24 12:35:15 starting storage migration
2024-01-24 12:35:15 scsi0: start migration to nbd:unix:/run/qemu-server/xxx_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
channel 3: open failed: connect failed: open failed

drive-scsi0: Cancelling block job
drive-scsi0: Done.
2024-01-24 12:35:15 ERROR: online migrate failure - mirroring error: VM xxx qmp command 'drive-mirror' failed - Failed to read initial magic: Unexpected end-of-file before all data were read
2024-01-24 12:35:15 aborting phase 2 - cleanup resources
2024-01-24 12:35:15 migrate_cancel
2024-01-24 12:35:20 ERROR: migration finished with problems (duration 00:00:11)
TASK ERROR: migration problems

When I do the same via insecure migration, it works as intended:

Code:

2024-01-24 12:38:43 starting migration of VM xxx to node 'xxxxxxxxx' (xxxxxxx)
2024-01-24 12:38:43 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:38:43 starting VM xxx on remote node 'xxxxxxx'
2024-01-24 12:38:47 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:38:47 start remote tunnel
2024-01-24 12:38:48 ssh tunnel ver 1
2024-01-24 12:38:48 starting storage migration
2024-01-24 12:38:48 scsi0: start migration to nbd:xxxxxxxx:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0 with bandwidth limit: 80000 KB/s
drive-scsi0: transferred 257.0 MiB of 50.0 GiB (0.50%) in 1m 31s
drive-scsi0: transferred 416.0 MiB of 50.0 GiB (0.81%) in 1m 32s
drive-scsi0: transferred 620.0 MiB of 50.0 GiB (1.21%) in 1m 33s
...

SSH works from and to the two nodes without options. The sshds listen on the default port and a secondary one, that shouldn't be an issue. What's interesting is that apparently only the tunnel setup breaks, not the SSH communication itself. At least if I understand the logs correctly (compare the "scsi0: start migration to ..." different targets).

I'm a bit lost on what to try next. While the migration network is secure and isolated, I'd prefer to have a tunneled migration anyway. Any tips on how to get more detailed log data on the tunnel setup or what else it might be?

Thanks!

fabian · Jan 24, 2024

could you double check that forwarding unix sockets works (as the root user)?

andreas.hammer · Jan 24, 2024

I did. Turns out that disabling TCP forwarding does indeed disable all kinds of forwardings. That was clearly a config error on my side. Now forwarding and secure migration works. Thank you!

For anyone with the same problem: Check your sshd_config and make sure "AllowTcpForwarding" is set to "yes".

Search

Search

Online Migration local -> local only insecure

andreas.hammer

New Member

fabian

Proxmox Staff Member

andreas.hammer

New Member