Online Migration local -> local only insecure

andreas.hammer

New Member
Jan 24, 2024
2
2
3
Hi there.

I'm having problems with online migration of VMs between two hosts in a cluster. Both have local storages (lvm thin).

If I try to start the migration via the GUI or without options from the command line, I consistently get the following error:

Bash:
2024-01-24 12:35:10 starting migration of VM xxx to node 'xxxxxxxx' (xxxxxxxxx)
2024-01-24 12:35:10 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:35:10 starting VM xxx on remote node 'xxxxxxxxx'
2024-01-24 12:35:14 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:35:14 start remote tunnel
2024-01-24 12:35:15 ssh tunnel ver 1
2024-01-24 12:35:15 starting storage migration
2024-01-24 12:35:15 scsi0: start migration to nbd:unix:/run/qemu-server/xxx_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
channel 3: open failed: connect failed: open failed

drive-scsi0: Cancelling block job
drive-scsi0: Done.
2024-01-24 12:35:15 ERROR: online migrate failure - mirroring error: VM xxx qmp command 'drive-mirror' failed - Failed to read initial magic: Unexpected end-of-file before all data were read
2024-01-24 12:35:15 aborting phase 2 - cleanup resources
2024-01-24 12:35:15 migrate_cancel
2024-01-24 12:35:20 ERROR: migration finished with problems (duration 00:00:11)
TASK ERROR: migration problems

When I do the same via insecure migration, it works as intended:

Code:
2024-01-24 12:38:43 starting migration of VM xxx to node 'xxxxxxxxx' (xxxxxxx)
2024-01-24 12:38:43 found local disk 'local:xxx/vm-xxx-disk-0.raw' (attached)
2024-01-24 12:38:43 starting VM xxx on remote node 'xxxxxxx'
2024-01-24 12:38:47 volume 'local:xxx/vm-xxx-disk-0.raw' is 'local-storage-1:vm-xxx-disk-0' on the target
2024-01-24 12:38:47 start remote tunnel
2024-01-24 12:38:48 ssh tunnel ver 1
2024-01-24 12:38:48 starting storage migration
2024-01-24 12:38:48 scsi0: start migration to nbd:xxxxxxxx:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0 with bandwidth limit: 80000 KB/s
drive-scsi0: transferred 257.0 MiB of 50.0 GiB (0.50%) in 1m 31s
drive-scsi0: transferred 416.0 MiB of 50.0 GiB (0.81%) in 1m 32s
drive-scsi0: transferred 620.0 MiB of 50.0 GiB (1.21%) in 1m 33s
...

SSH works from and to the two nodes without options. The sshds listen on the default port and a secondary one, that shouldn't be an issue. What's interesting is that apparently only the tunnel setup breaks, not the SSH communication itself. At least if I understand the logs correctly (compare the "scsi0: start migration to ..." different targets).

I'm a bit lost on what to try next. While the migration network is secure and isolated, I'd prefer to have a tunneled migration anyway. Any tips on how to get more detailed log data on the tunnel setup or what else it might be?

Thanks!
 
could you double check that forwarding unix sockets works (as the root user)?
 
  • Like
Reactions: andreas.hammer
I did. Turns out that disabling TCP forwarding does indeed disable all kinds of forwardings. That was clearly a config error on my side. Now forwarding and secure migration works. Thank you!

For anyone with the same problem: Check your sshd_config and make sure "AllowTcpForwarding" is set to "yes".
 
  • Like
Reactions: jsabater and fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!