migration__' failed: exit code 255

Apr 27, 2024
336
91
28
Portland, OR
www.gnetsys.net
I can successfully migrate a small guest between two geographically remote datacenters.
~65ms lag. Datacenter Manager 0.1.11, PVE 8.3.3

It fails on a large guest. Looks like zero traffic happened.

And then the qm tunnel eventually dies. The logs mention snapshots. There are no snapshots on the VM.

While this big VM migration was hanging, I was able to migrate my small 28k test VM between these same two sites/PVE hosts over the same link. So ... it should work, at least that's the theory ...

I know its an alpha. I might just give up on it. But if you have any troubleshooting hints, I'll try em.

2025-02-17 12:32:53 remote: started tunnel worker 'UPID:<snip>:qmtunnel:246:root@pam!pdm-admin:'
tunnel: -> sending command "version" to remote
tunnel: <- got reply
2025-02-17 12:32:54 local WS tunnel version: 2
2025-02-17 12:32:54 remote WS tunnel version: 2
2025-02-17 12:32:54 minimum required WS tunnel version: 2
websocket tunnel started
2025-02-17 12:32:54 starting migration of VM 246 to node <snip>
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2025-02-17 12:32:54 found local disk 'local-zfs:vm-246-disk-0' (attached)
2025-02-17 12:32:54 copying local disk images
tunnel: -> sending command "disk-import" to remote
tunnel: <- got reply
tunnel: accepted new connection on '/run/pve/246.storage'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/pve/246.storage'
full send of rpool/data/vm-246-disk-0@__migration__ estimated size is 130G
total estimated size is 130G
TIME SENT SNAPSHOT rpool/data/vm-246-disk-0@__migration__
tunnel: Tunnel for /run/pve/246.storage failed - Connection reset by peer (os error 104)
command 'zfs send -Rpv -- rpool/data/vm-246-disk-0@__migration__' failed: got signal 13
command 'set -o pipefail && pvesm export local-zfs:vm-246-disk-0 zfs - -with-snapshots 1 -snapshot __migration__' failed: exit code 255


....
Hmm. Maybe the tunnel doesn't normally fail. I just let it run for a couple hours and finally killed it myself. Prior tunnel failure may have been due to other factors. So, maybe it doesn't fail, it just doesn't do anything.
 
Last edited: