Migration of LXC failing with stale volume

ggee

New Member
May 11, 2024
7
0
1
I am trying to migrate an LXC from a 3 node cluster to a single node setup. I tried two different LXC and I get the same error with no real details what failed. Does it support LXC migration? Both ends are on 'storage' which as both ZFS. I was just testing, so the LXC it not running and I unchecked the 'delete source'.

I had some previous fails but found out that you can't migrate if the LXC has replication in the source cluster. So removed replication jobs for the LXC and tried again.

Code:
2025-01-21 22:23:36 ERROR: found stale volume copy 'storage:subvol-100-disk-0' on node 'pve-ur'

Is this caused by the remains of an initial bad migration that didn't clean up?
 
I also tried to migrate a CT to a different Cluster but with no success.

Task Log:
Code:
2025-01-22 07:59:30 remote: started tunnel worker 'UPID:PMX7:0035BD85:038B8A18:67909751:vzmtunnel:720:root@pam!pdm-admin:'
tunnel: -> sending command "version" to remote
tunnel: <- got reply
2025-01-22 07:59:30 local WS tunnel version: 2
2025-01-22 07:59:30 remote WS tunnel version: 2
2025-01-22 07:59:30 minimum required WS tunnel version: 2
2025-01-22 07:59:30 websocket tunnel started
2025-01-22 07:59:30 starting migration of CT 720 to node 'PMX7' (10.22.42.7)
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2025-01-22 07:59:30 found local volume 'vm_nvme:vm-720-disk-0' (in current VM config)
2025-01-22 07:59:30 can't migrate local volume 'vm_nvme:vm-720-disk-0': non-migratable snapshot exists
2025-01-22 07:59:30 ERROR: can't migrate CT - check log
2025-01-22 07:59:30 aborting phase 1 - cleanup resources
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
2025-01-22 07:59:31 start final cleanup
2025-01-22 07:59:31 ERROR: migration aborted (duration 00:00:01): can't migrate CT - check log
TASK ERROR: migration aborted

As Storage we use Ceph Squid, but same result if I move the disk to the local ZFS.
I think it's not possible to migrate a CT yet.

Or there is a different solution I don't know of yet? :)
 
2025-01-22 07:59:30 can't migrate local volume 'vm_nvme:vm-720-disk-0': non-migratable snapshot exists
For you, I assume it is the above. I find that you have to make sure no ties to the cluster exist before migration.
- no replication jobs
- ???

Are there other requirements?
- no snapshots??

See if your CT has a snapshot and delete it first.

But these should not be a problem. If they exist, ignore them and only copy what can be.
 
I looked more into the message and there was a very long line that started like this

Code:
2025-01-30 00:58:27 volume 'storage:subvol-100-disk-0' is 'storage:subvol-100-disk-0' on the target
2025-01-30 00:58:27 mapped: net0 from vmbr0 to vmbr0
tunnel: -> sending command "config" to remote
tunnel: <- got reply
2025-01-30 00:58:27 ERROR: error - tunnel command '{"conf".........
At the very end of that line was this.
Code:
failed to handle 'config' command - 403 Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)
Not sure why it is complaining if I used root@pam to add the cluster to PDM.
 
So for this error, what do I need to do to clean up this stale volume??


Code:
2025-01-21 22:23:36 ERROR: found stale volume copy 'storage:subvol-100-disk-0' on node 'pve-ur'
 
Can't answer but came here to say I'm getting exactly the same error when trying to migrate LXC's between nodes.