Live migrate of LXC continues to fail

voarsh

Active Member
Nov 20, 2020
218
20
38
29
I've tried this multiple times:
It always seems to fail at around 40% with no indication of why.
How can I look into why this happens?

drive-scsi0: transferred: 136469020672 bytes remaining: 198581747712 bytes total: 335050768384 bytes progression: 40.73 % busy: 1 ready: 0
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2020-12-06 16:47:48 ERROR: online migrate failure - mirroring error: drive-scsi0: mirroring has been cancelled
2020-12-06 16:47:48 aborting phase 2 - cleanup resources
2020-12-06 16:47:48 migrate_cancel
2020-12-06 16:47:57 ERROR: migration finished with problems (duration 00:48:17)
TASK ERROR: migration problems

Target VM create:
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "vm-106-disk-0" created.
WARNING: Sum of all thin volume sizes (368.00 GiB) exceeds the size of thin pool pve/data and the size of whole volume group (<232.33 GiB).
migration listens on unix:/run/qemu-server/106.migrate
storage migration listens on nbd:unix:/run/qemu-server/106_nbd.migrate:exportname=drive-scsi0 volume:local-lvm:vm-106-disk-0,format=raw,size=310G
TASK OK
 
Last edited:
I even tried a different migration (restart):




Task viewer: CT 100 - Migrate



2020-12-06 16:53:17 shutdown CT 100
2020-12-06 16:53:22 starting migration of CT 100 to node 'pvedell' (192.168.100.2)
2020-12-06 16:53:22 found local volume 'local-lvm:vm-100-disk-0' (in current VM config)
2020-12-06 16:55:56 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pvedell' root@192.168.100.2 pvesr set-state 100 \''{}'\'
Logical volume "vm-100-disk-0" successfully removed
2020-12-06 16:55:59 start final cleanup
2020-12-06 16:56:00 start container on target node
2020-12-06 16:56:00 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pvedell' root@192.168.100.2 pct start 100
2020-12-06 16:56:01 unable to open file '/var/lib/lxc/100/rules.seccomp.tmp.3974' - No such file or directory
2020-12-06 16:56:01 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pvedell' root@192.168.100.2 pct start 100' failed: exit code 255
2020-12-06 16:56:01 ERROR: migration finished with problems (duration 00:02:44)
TASK ERROR: migration problems


TASK ERROR: unable to open file '/var/lib/lxc/100/rules.seccomp.tmp.3974' - No such file or directory

--

I also get the same issue with an offline migration.
No migration (VM or LXC) seems to actually work.


--
Digging a big further I see that /var/lib/lxc/100/ doesn't contain anything on the target of the migration.
I created these files manually and the LXC starts on the new target.

I do not know what to do about the failing VM that doesn't migrate.
 
Last edited:
do the logs (syslog/journal) or dmesg say anything (on both nodes) ?
whats your pveversion -v ?