Can't migrate container back to node

machone · Feb 4, 2025

Not sure how I managed this, but I have two nodes in a cluster. I tried to migrate a container from one node to the other, unsuccessfully, due to a lack of shared storage between the nodes - fair enough. I tried replication and that didn't work either. I gave up on all that, but left the two nodes as they were. Tonight I powered down the main node to replace one HDD, but somehow while I was doing all that the container in question (my DNS server) migrated over to the 2nd node despite its root storage target being unavailable on that node. Now my main node is back up but the container refuses to move back:

Code:

root@proxmox2:~# ha-manager status
quorum OK
master proxmox (active, Tue Feb  4 00:42:23 2025)
lrm proxmox (idle, Tue Feb  4 00:42:23 2025)
lrm proxmox2 (idle, Tue Feb  4 00:42:24 2025)
service ct:101 (proxmox2, disabled)
root@proxmox2:~# ha-manager crm-command relocate ct:101 proxmox
root@proxmox2:~# journalctl -f
[...]
Feb 04 00:43:04 proxmox2 pve-ha-lrm[998]: successfully acquired lock 'ha_agent_proxmox2_lock'
Feb 04 00:43:04 proxmox2 pve-ha-lrm[998]: watchdog active
Feb 04 00:43:04 proxmox2 pve-ha-lrm[998]: status change wait_for_agent_lock => active
Feb 04 00:43:04 proxmox2 pve-ha-lrm[5948]: <root@pam> starting task UPID:proxmox2:0000173D:000205DD:67A1D318:vzmigrate:101:root@pam:
Feb 04 00:43:04 proxmox2 pve-ha-lrm[5949]: migration aborted
Feb 04 00:43:04 proxmox2 pve-ha-lrm[5948]: <root@pam> end task UPID:proxmox2:0000173D:000205DD:67A1D318:vzmigrate:101:root@pam: migration aborted
Feb 04 00:43:04 proxmox2 pve-ha-lrm[5948]: service ct:101 not moved (migration error)

And this shows up if I try to use replication to somehow trick it into going back:

Code:

101-0: got unexpected replication job error - zfs error: cannot open 'nvme_zfs/subvol-101-disk-0': dataset does not exist

I can't backup the container or boot it to copy files off of it due to the root storage being inaccessible from this 2nd node. Still baffled about how it managed to migrate given that's the case, but here I am. What must I do?

waltar · Feb 4, 2025

When the lxc 101 image is still on your first node called "proxmox" just
"mv /etc/pve/nodes/proxmox2/lxc/101.conf /etc/pve/nodes/proxmox/lxc/." would do for you, then start lxc on first node.

machone · Feb 4, 2025

waltar said:
When the lxc 101 image is still on your first node called "proxmox" just
"mv /etc/pve/nodes/proxmox2/lxc/101.conf /etc/pve/nodes/proxmox/lxc/." would do for you, then start lxc on first node.

Hey, thanks. So I actually tried this with `cp` and it said that 101.conf already exists in proxmox/lxc/, even though I couldn’t see such a file there. I think I got a permission denied error if I tried to copy to a different file name but honestly I can’t remember (yes, I invoked it with sudo). Next step tomorrow is to see if `lsof` or `stat` can show me an inode with that name that might not be showing up using `ls`. Weird stuff.

Anyway if the cp/mv doesn’t work, then what?

waltar · Feb 4, 2025

"mv" works and "cp" not because a machine could only exist anywhere one time and so your error from cp is expected as cannot work.

machone · Feb 4, 2025

waltar said:
"mv" works and "cp" not because a machine could only exist anywhere one time and so your error from cp is expected as cannot work.

You're right, mv worked. Thank you!

Search

Search

Can't migrate container back to node

machone

Member

waltar

Renowned Member

machone

Member

waltar

Renowned Member

machone

Member

We value your privacy