Migration of LXC failing with stale volume

ggee · Jan 22, 2025

I am trying to migrate an LXC from a 3 node cluster to a single node setup. I tried two different LXC and I get the same error with no real details what failed. Does it support LXC migration? Both ends are on 'storage' which as both ZFS. I was just testing, so the LXC it not running and I unchecked the 'delete source'.

I had some previous fails but found out that you can't migrate if the LXC has replication in the source cluster. So removed replication jobs for the LXC and tried again.

Code:

2025-01-21 22:23:36 ERROR: found stale volume copy 'storage:subvol-100-disk-0' on node 'pve-ur'

Is this caused by the remains of an initial bad migration that didn't clean up?

smueller · Jan 22, 2025

I also tried to migrate a CT to a different Cluster but with no success.

Task Log:

Code:

2025-01-22 07:59:30 remote: started tunnel worker 'UPID:PMX7:0035BD85:038B8A18:67909751:vzmtunnel:720:root@pam!pdm-admin:'
tunnel: -> sending command "version" to remote
tunnel: <- got reply
2025-01-22 07:59:30 local WS tunnel version: 2
2025-01-22 07:59:30 remote WS tunnel version: 2
2025-01-22 07:59:30 minimum required WS tunnel version: 2
2025-01-22 07:59:30 websocket tunnel started
2025-01-22 07:59:30 starting migration of CT 720 to node 'PMX7' (10.22.42.7)
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2025-01-22 07:59:30 found local volume 'vm_nvme:vm-720-disk-0' (in current VM config)
2025-01-22 07:59:30 can't migrate local volume 'vm_nvme:vm-720-disk-0': non-migratable snapshot exists
2025-01-22 07:59:30 ERROR: can't migrate CT - check log
2025-01-22 07:59:30 aborting phase 1 - cleanup resources
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
2025-01-22 07:59:31 start final cleanup
2025-01-22 07:59:31 ERROR: migration aborted (duration 00:00:01): can't migrate CT - check log
TASK ERROR: migration aborted

As Storage we use Ceph Squid, but same result if I move the disk to the local ZFS.
I think it's not possible to migrate a CT yet.

Or there is a different solution I don't know of yet?

ggee · Jan 22, 2025

smueller said:
2025-01-22 07:59:30 can't migrate local volume 'vm_nvme:vm-720-disk-0': non-migratable snapshot exists

For you, I assume it is the above. I find that you have to make sure no ties to the cluster exist before migration.
- no replication jobs
- ???

Are there other requirements?
- no snapshots??

See if your CT has a snapshot and delete it first.

But these should not be a problem. If they exist, ignore them and only copy what can be.

smueller · Jan 23, 2025

Hi @ggee,

sorry my bad, I had a snapshot in there.
Normally we don't make snapshots so I didn't check it right away. xD

ggee · Jan 30, 2025

I looked more into the message and there was a very long line that started like this

Code:

2025-01-30 00:58:27 volume 'storage:subvol-100-disk-0' is 'storage:subvol-100-disk-0' on the target
2025-01-30 00:58:27 mapped: net0 from vmbr0 to vmbr0
tunnel: -> sending command "config" to remote
tunnel: <- got reply
2025-01-30 00:58:27 ERROR: error - tunnel command '{"conf".........

At the very end of that line was this.

Code:

failed to handle 'config' command - 403 Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)

Not sure why it is complaining if I used root@pam to add the cluster to PDM.

ggee · Jan 31, 2025

So for this error, what do I need to do to clean up this stale volume??

Code:

2025-01-21 22:23:36 ERROR: found stale volume copy 'storage:subvol-100-disk-0' on node 'pve-ur'

andyjayh · Feb 7, 2025

Can't answer but came here to say I'm getting exactly the same error when trying to migrate LXC's between nodes.

Search

Search

Migration of LXC failing with stale volume

ggee

New Member

smueller

Member

ggee

New Member

smueller

Member

ggee

New Member

ggee

New Member

andyjayh

New Member

We value your privacy