[SOLVED] VM Fails to migrate

LordRatner

Member
Jun 20, 2022
50
13
8
First off, here's the output:

Code:
task started by HA resource agent
2023-07-09 00:32:21 use dedicated network address for sending migration traffic (192.168.10.21)
2023-07-09 00:32:22 starting migration of VM 112 to node 'node2' (192.168.10.21)
2023-07-09 00:32:22 found local, replicated disk 'local-zfs:vm-112-disk-0' (attached)
2023-07-09 00:32:22 replicating disk images
2023-07-09 00:32:22 start replication job
2023-07-09 00:32:22 guest => VM 112, running => 0
2023-07-09 00:32:22 volumes => local-zfs:vm-112-disk-0
2023-07-09 00:32:23 create snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:23 using secure transmission, rate limit: 500 MByte/s
2023-07-09 00:32:23 full sync 'local-zfs:vm-112-disk-0' (__replicate_112-0_1688880742__)
2023-07-09 00:32:23 using a bandwidth limit of 500000000 bytes per second for transferring 'local-zfs:vm-112-disk-0'
2023-07-09 00:32:24 full send of rpool/data/vm-112-disk-0@pre_occ estimated size is 46.8G
2023-07-09 00:32:24 send from @pre_occ to rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__ estimated size is 27.0G
2023-07-09 00:32:24 send from @__replicate_112-0_1681815600__ to rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__ estimated size is 625M
2023-07-09 00:32:24 total estimated size is 74.5G
2023-07-09 00:32:24 volume 'rpool/data/vm-112-disk-0' already exists
2023-07-09 00:32:24 command 'zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2023-07-09 00:32:24 delete previous replication snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:24 end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 ERROR: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 aborting phase 1 - cleanup resources
2023-07-09 00:32:24 ERROR: migration aborted (duration 00:00:03): command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
TASK ERROR: migration aborted

When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data

Code:
root@node2:/rpool/data# ls -l
total 137
drwxr-xr-x 18 100000 100000 24 Jul  9 00:20 subvol-101-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-104-disk-0
drwxr-xr-x 17 100000 100000 24 Jul  6 18:23 subvol-105-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-107-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-108-disk-0
drwxr-xr-x 17 root   root   23 Dec 31  2022 subvol-109-disk-0
drwxr-xr-x 17 root   root   25 Jun  1 21:42 subvol-110-disk-0
drwxrwxr-x  4   1000   1001  4 Dec 11  2022 subvol-110-disk-1
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-111-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  9 00:20 subvol-115-disk-0
drwxr-xr-x 17 root   root   23 Dec 31  2022 subvol-116-disk-0
drwxr-xr-x 17 100000 100000 23 May  1 10:47 subvol-117-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  9 00:20 subvol-121-disk-0
drwxr-xr-x 17 100000 100000 24 Jul  6 18:23 subvol-123-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-124-disk-0
drwxr-xr-x 17 root   root   26 Jul  6 18:23 subvol-126-disk-0
drwxr-xr-x  4   1000 media   4 Jun 28 15:20 subvol-126-disk-1
drwxr-xr-x 17 root   root   24 Jul  6 18:23 subvol-127-disk-0
Not sure why some are owned by root and others by 10000, but I think it has to do with privileged vs unprivileged. Shouldn't be a factor here.

Why can't I get this VM to migrate? I thought "volume 'rpool/data/vm-112-disk-0' already exists" might be a clue, but I don't actually see where it exists on either node.

Thanks
 
I have also the problem at the moment to migrate one VM from one server in the cluster to another takes a lot of time and begins to duplicate the disk several times, I have three nodes on my cluster
 
Hi,
When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data

Code:
root@node2:/rpool/data# ls -l
this only shows the files in the ZFS filesystem. Use zfs list to also see the zvol devices. Those are used as virtual block devices for VMs. In particular, please share the output of zfs list -t all -r /rpool/data/vm-112-disk-0 as well as pveversion -v on both source and target node.
 
Hi,
I have also the problem at the moment to migrate one VM from one server in the cluster to another takes a lot of time and begins to duplicate the disk several times, I have three nodes on my cluster
that sounds like you might have two storages configured that point to the same backing storage? Please share the output of /etc/pve/storage.cfg and pveversion -v. While the particular issue for migration has been fixed in Proxmox VE 8 (migration no longer scans local storages for disks not in the configuration), it's still problematic to configure such a "duplicate" storage, because of locking, IDs not actually being unique, etc.
 
The reason for my cluster is to update/upgrade the nodes one by one. The last server I included in the cluster (server_3) has version 7.4-1. This way, I migrate the VMs from server_2 (version 6.4-1) to server_3 and upgrade the node (server_2) to the version 7.4-1. With this movement, I noticed that the migration create normally 3 copies for each HD from each VM. Once I upgraded the server_2 to the version 7.4-1, I put back the VMs and then the HDs multiplied, reaching 9 or 10 copies.

I removed the duplicate HDs using the method proposed in this thread:
Cannot remove image, a guest with VMID 'xxx' exists!

Initially, I would to ensure that all servers are on the same update/upgrade. At the time I checked, the version 7.4 was stable and version 8 was still in beta. Now, I'am a bit concerned about moving to v8. Once the upgrades are finished, I plan to make the important VMs highly available (HA).

Here the results for the 3 servers:

Those are the versions of the nodes

root@server_1 ~ # pveversion -v proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-11 pve-kernel-helper: 6.4-11 pve-kernel-5.4.157-1-pve: 5.4.157-1 pve-kernel-5.4.140-1-pve: 5.4.140-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.5-pve2~bpo10+1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve4~bpo10 libjs-extjs: 6.0.1-10 libknet1: 1.22-pve2~bpo10+1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-4 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.1.13-2 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.3-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2

root@server_2 ~ # pveversion -v proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve) pve-manager: 7.4-15 (running version: 7.4-15/a5d2a31e) pve-kernel-5.15: 7.4-4 pve-kernel-5.4: 6.4-20 pve-kernel-5.15.108-1-pve: 5.15.108-1 pve-kernel-5.4.203-1-pve: 5.4.203-1 pve-kernel-5.4.140-1-pve: 5.4.140-1 ceph-fuse: 14.2.21-1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx4 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libproxmox-rs-perl: 0.2.1 libpve-access-control: 7.4.1 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.4-2 libpve-guest-common-perl: 4.2-4 libpve-http-server-perl: 4.2-3 libpve-rs-perl: 0.7.7 libpve-storage-perl: 7.4-3 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-2 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.4.2-1 proxmox-backup-file-restore: 2.4.2-1 proxmox-kernel-helper: 7.4-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-offline-mirror-helper: 0.5.2 proxmox-widget-toolkit: 3.7.3 pve-cluster: 7.3-3 pve-container: 4.4-6 pve-docs: 7.4-2 pve-edk2-firmware: 3.20230228-4~bpo11+1 pve-firewall: 4.3-4 pve-firmware: 3.6-5 pve-ha-manager: 3.6.1 pve-i18n: 2.12-1 pve-qemu-kvm: 7.2.0-8 pve-xtermjs: 4.16.0-2 qemu-server: 7.4-4 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+3 vncterm: 1.7-1

root@server_3 ~ # pveversion -v proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve) pve-manager: 7.4-15 (running version: 7.4-15/a5d2a31e) pve-kernel-5.15: 7.4-4 pve-kernel-5.15.108-1-pve: 5.15.108-1 ceph-fuse: 14.2.21-1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx4 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libproxmox-rs-perl: 0.2.1 libpve-access-control: 7.4.1 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.4-2 libpve-guest-common-perl: 4.2-4 libpve-http-server-perl: 4.2-3 libpve-rs-perl: 0.7.7 libpve-storage-perl: 7.4-3 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-2 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.4.2-1 proxmox-backup-file-restore: 2.4.2-1 proxmox-kernel-helper: 7.4-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.7.3 pve-cluster: 7.3-3 pve-container: 4.4-6 pve-docs: 7.4-2 pve-edk2-firmware: 3.20230228-4~bpo11+1 pve-firewall: 4.3-4 pve-firmware: 3.6-5 pve-ha-manager: 3.6.1 pve-i18n: 2.12-1 pve-qemu-kvm: 7.2.0-8 pve-xtermjs: 4.16.0-2 qemu-server: 7.4-4 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+3 vncterm: 1.7-1

And those the configurations for the storages

root@server_1 ~ # cat /etc/pve/storage.cfg dir: local path /var/lib/vz content vztmpl,images,iso,rootdir,snippets prune-backups keep-all=1 shared 0 dir: backup path /backup content backup prune-backups keep-all=1 shared 1 lvmthin: server_1-lvm thinpool data vgname vg0 content images,rootdir lvmthin:server_2-lvm thinpool data vgname vg0 content images,rootdir pbs: Backup datastore backupDir server backup.server.info content backup nodes server_2,server_1 prune-backups keep-all=1 username root@pam lvmthin: server_3-lvm thinpool data vgname vg0 content images,rootdir


root@server_2 ~ # cat /etc/pve/storage.cfg dir: local path /var/lib/vz content vztmpl,images,iso,rootdir,snippets prune-backups keep-all=1 shared 0 dir: backup path /backup content backup prune-backups keep-all=1 shared 1 lvmthin: server_1-lvm thinpool data vgname vg0 content images,rootdir lvmthin: server_2-lvm thinpool data vgname vg0 content images,rootdir pbs: Backup datastore backupDir server backup.server.info content backup nodes server_2,server_1 prune-backups keep-all=1 username root@pam lvmthin: server3-lvm thinpool data vgname vg0 content images,rootdir


root@server_3 ~ # cat /etc/pve/storage.cfg dir: local path /var/lib/vz content vztmpl,images,iso,rootdir,snippets prune-backups keep-all=1 shared 0 dir: backup path /backup content backup prune-backups keep-all=1 shared 1 lvmthin: server_1-lvm thinpool data vgname vg0 content images,rootdir lvmthin: server_2-lvm thinpool data vgname vg0 content images,rootdir pbs: Backup datastore backupDir server backup.server.info content backup nodes server_2,server_1 prune-backups keep-all=1 username root@pam lvmthin: server_3-lvm thinpool data vgname vg0 content images,rootdir
 
Code:
lvmthin: server_1-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_3-lvm
        thinpool data
        vgname vg0
        content images,rootdir
The storage configuration in Proxmox VE is shared across nodes. You only need one entry in the storage configuration if the same local storage is available on all nodes. If a storage is not available on all nodes, you can restrict it with the nodes property. Since all three entries use the same backing storage (the data thinpool) and are not restricted to a given node, you have a triplicate storage and migration in Proxmox VE 7 gets confused.

You can either:
  • Use a single storage entry, but you'll need to adapt references to disks in the guest manually.
  • Restrict each storage to the specific node (but that is not clean, as it's supposed to be a single local storage configuration available on all three nodes).
 
Hi,

this only shows the files in the ZFS filesystem. Use zfs list to also see the zvol devices. Those are used as virtual block devices for VMs. In particular, please share the output of zfs list -t all -r /rpool/data/vm-112-disk-0 as well as pveversion -v on both source and target node.
Code:
root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 69.5G  1.45T     41.4G  -
rpool/data/vm-112-disk-0@pre_occ                         22.0G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__  5.12G      -     40.3G  -

Code:
root@node1:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-4
pve-kernel-5.19: 7.2-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.1
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Code:
root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 69.5G  1.45T     41.4G  -
rpool/data/vm-112-disk-0@pre_occ                         22.0G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__  5.12G      -     40.3G  -
Is this the source or target of the migration? What about the other node?
 
Is this the source or target of the migration? What about the other node?
Code:
root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0  42.2G  1.36T     42.2G  -

Just delete that and it should work again?

What is the right way to remove the logical volume?

Thanks!
 
Code:
root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0  42.2G  1.36T     42.2G  -

Just delete that and it should work again?
Yes. The question is how the image got here. Was the VM ever on that node before? Or maybe it's from a previous failed migration where it couldn't be cleaned up correctly. You can use zpool history | grep vm-112-disk-0 to get more info.

What is the right way to remove the logical volume?
You can remove the volume in the web UI by selecting the local-zfs storage on the target node in the VM Disks content tab or via the CLI with pvesm free local-zfs:vm-112-disk-0 (make sure you're on the correct node!).
 
Yes. The question is how the image got here. Was the VM ever on that node before? Or maybe it's from a previous failed migration where it couldn't be cleaned up correctly. You can use zpool history | grep vm-112-disk-0 to get more info.


You can remove the volume in the web UI by selecting the local-zfs storage on the target node in the VM Disks content tab or via the CLI with pvesm free local-zfs:vm-112-disk-0 (make sure you're on the correct node!).
That fixed it. Here's the history, I don't know how to interpret it but you can tell where the problem is fixed today after removing the volume on the other node

Node1, where the VM was a guest:
Code:
2023-07-12.22:17:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689218222__
2023-07-12.22:17:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689218222__
2023-07-12.22:47:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689220022__
2023-07-12.22:47:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689220022__
2023-07-12.23:17:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689221822__
2023-07-12.23:17:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689221822__
2023-07-12.23:47:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689223622__
2023-07-12.23:47:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689223622__
2023-07-13.00:17:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689225422__
2023-07-13.00:17:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689225422__
2023-07-13.00:47:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689227222__
2023-07-13.00:47:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689227222__
2023-07-13.01:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689229080__
2023-07-13.01:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689229080__
2023-07-13.01:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689230880__
2023-07-13.01:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689230880__
2023-07-13.02:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689232680__
2023-07-13.02:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689232680__
2023-07-13.02:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689234480__
2023-07-13.02:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689234480__
2023-07-13.03:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689236280__
2023-07-13.03:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689236280__
2023-07-13.03:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689238080__
2023-07-13.03:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689238080__
2023-07-13.04:18:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689239880__
2023-07-13.04:18:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689239880__
2023-07-13.04:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689241680__
2023-07-13.04:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689241680__
2023-07-13.05:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689243481__
2023-07-13.05:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689243481__
2023-07-13.05:48:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689245282__
2023-07-13.05:48:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689245282__
2023-07-13.06:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689247082__
2023-07-13.06:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689247082__
2023-07-13.06:48:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689248882__
2023-07-13.06:48:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689248882__
2023-07-13.07:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689250682__
2023-07-13.07:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689250682__
2023-07-13.07:48:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689252482__
2023-07-13.07:48:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689252482__
2023-07-13.08:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689254282__
2023-07-13.08:18:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689254282__
2023-07-13.08:48:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689256083__
2023-07-13.08:48:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689256083__
2023-07-13.09:08:16 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.09:11:14 zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.12:00:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
2023-07-13.12:00:23 zfs recv -F -- rpool/data/vm-112-disk-0
2023-07-13.12:00:24 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__

And node2, where the VM was supposed to migrate and couldn't:
Code:
2023-07-08.17:25:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688855103__
2023-07-08.17:25:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688855103__
2023-07-08.17:55:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688856903__
2023-07-08.17:55:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688856903__
2023-07-08.18:25:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688858703__
2023-07-08.18:25:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688858703__
2023-07-08.18:55:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688860503__
2023-07-08.18:55:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688860503__
2023-07-08.19:25:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688862303__
2023-07-08.19:25:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688862303__
2023-07-08.19:55:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688864103__
2023-07-08.19:55:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688864103__
2023-07-08.20:26:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688865960__
2023-07-08.20:26:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688865960__
2023-07-08.20:56:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688867760__
2023-07-08.20:56:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688867760__
2023-07-08.21:26:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688869561__
2023-07-08.21:26:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688869561__
2023-07-08.21:56:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688871361__
2023-07-08.21:56:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688871361__
2023-07-08.22:26:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688873161__
2023-07-08.22:26:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688873161__
2023-07-08.22:56:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688874961__
2023-07-08.22:56:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688874961__
2023-07-08.23:26:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688876762__
2023-07-08.23:26:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688876762__
2023-07-08.23:56:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688878562__
2023-07-08.23:56:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688878562__
2023-07-13.09:07:19 zfs destroy -r rpool/data/vm-112-disk-0
2023-07-13.09:11:15 zfs recv -F -- rpool/data/vm-112-disk-0
2023-07-13.09:11:16 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
2023-07-13.12:00:06 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689267604__
2023-07-13.12:00:23 zfs send -Rpv -I __replicate_112-0_1689257295__ -- rpool/data/vm-112-disk-0@__replicate_112-0_1689267604__
2023-07-13.12:00:23 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
Thanks!
 
Node1, where the VM was a guest:
Code:
2023-07-13.09:08:16 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.09:11:14 zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.12:00:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
Replication does:
  • create a new snapshot, in this case __replicate_112-0_1689257295__.
  • replicate the delta to the previous replication snapshot incrementally or do a full send if no common snapshot can be found, in this case a full send.
  • remove the previous replication snapshot (if one exists), in this case __replicate_112-0_1681815600__.

Code:
root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0  42.2G  1.36T     42.2G  -
Because the target image didn't have any common snapshot (actually, none at all), it always tried to do a full send. It failed because the image was already present (we can't just autoremove, because we don't know what that image actually is, that's for the admin to decide).

So the question is why/when the __replicate_112-0_1681815600__ was deleted, note that the big number is the timestamp, so the snapshot should've been created
Code:
root@pve8a1 ~ # date -d@1681815600
Tue Apr 18 01:00:00 PM CEST 2023

And node2, where the VM was supposed to migrate and couldn't:
Code:
2023-07-13.09:07:19 zfs destroy -r rpool/data/vm-112-disk-0
2023-07-13.09:11:15 zfs recv -F -- rpool/data/vm-112-disk-0
2023-07-13.09:11:16 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
Of course it was destroyed once on the new image after the successful full send to clean up, but that's not the one we're interested in. We want to know why it was destroyed on the old image.

You could search the history on the target node further up for that snapshot.
 
Replication does:
  • create a new snapshot, in this case __replicate_112-0_1689257295__.
  • replicate the delta to the previous replication snapshot incrementally or do a full send if no common snapshot can be found, in this case a full send.
  • remove the previous replication snapshot (if one exists), in this case __replicate_112-0_1681815600__.


Because the target image didn't have any common snapshot (actually, none at all), it always tried to do a full send. It failed because the image was already present (we can't just autoremove, because we don't know what that image actually is, that's for the admin to decide).

So the question is why/when the __replicate_112-0_1681815600__ was deleted, note that the big number is the timestamp, so the snapshot should've been created
Code:
root@pve8a1 ~ # date -d@1681815600
Tue Apr 18 01:00:00 PM CEST 2023


Of course it was destroyed once on the new image after the successful full send to clean up, but that's not the one we're interested in. We want to know why it was destroyed on the old image.

You could search the history on the target node further up for that snapshot.
I don't think I'm going to figure that out, based on how many things I've had to do to get this VM to work. But at least you fixed my replication/migration issue, so I thank you!

One last question:
Code:
root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.43T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__     0B      -     40.9G  -
Code:
root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.32T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__  6.12M      -     40.9G  -
That all looks normal. Except in the webui, there are no snapshots listed for this VM. I don't need the pre_occ snapshot and don't want to waste the space, but how/can I get rid of it if it isn't showing?
 
I don't think I'm going to figure that out, based on how many things I've had to do to get this VM to work. But at least you fixed my replication/migration issue, so I thank you!

One last question:
Code:
root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.43T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__     0B      -     40.9G  -
Code:
root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.32T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__  6.12M      -     40.9G  -
That all looks normal. Except in the webui, there are no snapshots listed for this VM. I don't need the pre_occ snapshot and don't want to waste the space, but how/can I get rid of it if it isn't showing?
That means the snapshot does not exist in the VM configuration anymore, but only on the disks. You can remove it with zfs destroy rpool/data/vm-112-disk-0@pre_occ.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!