First off, here's the output:
When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data
Not sure why some are owned by root and others by 10000, but I think it has to do with privileged vs unprivileged. Shouldn't be a factor here.
Why can't I get this VM to migrate? I thought "volume 'rpool/data/vm-112-disk-0' already exists" might be a clue, but I don't actually see where it exists on either node.
Thanks
Code:
task started by HA resource agent
2023-07-09 00:32:21 use dedicated network address for sending migration traffic (192.168.10.21)
2023-07-09 00:32:22 starting migration of VM 112 to node 'node2' (192.168.10.21)
2023-07-09 00:32:22 found local, replicated disk 'local-zfs:vm-112-disk-0' (attached)
2023-07-09 00:32:22 replicating disk images
2023-07-09 00:32:22 start replication job
2023-07-09 00:32:22 guest => VM 112, running => 0
2023-07-09 00:32:22 volumes => local-zfs:vm-112-disk-0
2023-07-09 00:32:23 create snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:23 using secure transmission, rate limit: 500 MByte/s
2023-07-09 00:32:23 full sync 'local-zfs:vm-112-disk-0' (__replicate_112-0_1688880742__)
2023-07-09 00:32:23 using a bandwidth limit of 500000000 bytes per second for transferring 'local-zfs:vm-112-disk-0'
2023-07-09 00:32:24 full send of rpool/data/vm-112-disk-0@pre_occ estimated size is 46.8G
2023-07-09 00:32:24 send from @pre_occ to rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__ estimated size is 27.0G
2023-07-09 00:32:24 send from @__replicate_112-0_1681815600__ to rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__ estimated size is 625M
2023-07-09 00:32:24 total estimated size is 74.5G
2023-07-09 00:32:24 volume 'rpool/data/vm-112-disk-0' already exists
2023-07-09 00:32:24 command 'zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2023-07-09 00:32:24 delete previous replication snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:24 end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 ERROR: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 aborting phase 1 - cleanup resources
2023-07-09 00:32:24 ERROR: migration aborted (duration 00:00:03): command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
TASK ERROR: migration aborted
When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data
Code:
root@node2:/rpool/data# ls -l
total 137
drwxr-xr-x 18 100000 100000 24 Jul 9 00:20 subvol-101-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 6 18:23 subvol-104-disk-0
drwxr-xr-x 17 100000 100000 24 Jul 6 18:23 subvol-105-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 6 18:23 subvol-107-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 6 18:23 subvol-108-disk-0
drwxr-xr-x 17 root root 23 Dec 31 2022 subvol-109-disk-0
drwxr-xr-x 17 root root 25 Jun 1 21:42 subvol-110-disk-0
drwxrwxr-x 4 1000 1001 4 Dec 11 2022 subvol-110-disk-1
drwxr-xr-x 17 100000 100000 23 Jul 6 18:23 subvol-111-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 9 00:20 subvol-115-disk-0
drwxr-xr-x 17 root root 23 Dec 31 2022 subvol-116-disk-0
drwxr-xr-x 17 100000 100000 23 May 1 10:47 subvol-117-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 9 00:20 subvol-121-disk-0
drwxr-xr-x 17 100000 100000 24 Jul 6 18:23 subvol-123-disk-0
drwxr-xr-x 17 100000 100000 23 Jul 6 18:23 subvol-124-disk-0
drwxr-xr-x 17 root root 26 Jul 6 18:23 subvol-126-disk-0
drwxr-xr-x 4 1000 media 4 Jun 28 15:20 subvol-126-disk-1
drwxr-xr-x 17 root root 24 Jul 6 18:23 subvol-127-disk-0
Why can't I get this VM to migrate? I thought "volume 'rpool/data/vm-112-disk-0' already exists" might be a clue, but I don't actually see where it exists on either node.
Thanks