[SOLVED] VM Fails to migrate

LordRatner · Jul 9, 2023

First off, here's the output:

Code:

task started by HA resource agent
2023-07-09 00:32:21 use dedicated network address for sending migration traffic (192.168.10.21)
2023-07-09 00:32:22 starting migration of VM 112 to node 'node2' (192.168.10.21)
2023-07-09 00:32:22 found local, replicated disk 'local-zfs:vm-112-disk-0' (attached)
2023-07-09 00:32:22 replicating disk images
2023-07-09 00:32:22 start replication job
2023-07-09 00:32:22 guest => VM 112, running => 0
2023-07-09 00:32:22 volumes => local-zfs:vm-112-disk-0
2023-07-09 00:32:23 create snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:23 using secure transmission, rate limit: 500 MByte/s
2023-07-09 00:32:23 full sync 'local-zfs:vm-112-disk-0' (__replicate_112-0_1688880742__)
2023-07-09 00:32:23 using a bandwidth limit of 500000000 bytes per second for transferring 'local-zfs:vm-112-disk-0'
2023-07-09 00:32:24 full send of rpool/data/vm-112-disk-0@pre_occ estimated size is 46.8G
2023-07-09 00:32:24 send from @pre_occ to rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__ estimated size is 27.0G
2023-07-09 00:32:24 send from @__replicate_112-0_1681815600__ to rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__ estimated size is 625M
2023-07-09 00:32:24 total estimated size is 74.5G
2023-07-09 00:32:24 volume 'rpool/data/vm-112-disk-0' already exists
2023-07-09 00:32:24 command 'zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1688880742__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2023-07-09 00:32:24 delete previous replication snapshot '__replicate_112-0_1688880742__' on local-zfs:vm-112-disk-0
2023-07-09 00:32:24 end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 ERROR: command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
2023-07-09 00:32:24 aborting phase 1 - cleanup resources
2023-07-09 00:32:24 ERROR: migration aborted (duration 00:00:03): command 'set -o pipefail && pvesm export local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ | /usr/bin/cstream -t 500000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.10.21 -- pvesm import local-zfs:vm-112-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_112-0_1688880742__ -allow-rename 0' failed: exit code 255
TASK ERROR: migration aborted

When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data

Code:

root@node2:/rpool/data# ls -l
total 137
drwxr-xr-x 18 100000 100000 24 Jul  9 00:20 subvol-101-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-104-disk-0
drwxr-xr-x 17 100000 100000 24 Jul  6 18:23 subvol-105-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-107-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-108-disk-0
drwxr-xr-x 17 root   root   23 Dec 31  2022 subvol-109-disk-0
drwxr-xr-x 17 root   root   25 Jun  1 21:42 subvol-110-disk-0
drwxrwxr-x  4   1000   1001  4 Dec 11  2022 subvol-110-disk-1
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-111-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  9 00:20 subvol-115-disk-0
drwxr-xr-x 17 root   root   23 Dec 31  2022 subvol-116-disk-0
drwxr-xr-x 17 100000 100000 23 May  1 10:47 subvol-117-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  9 00:20 subvol-121-disk-0
drwxr-xr-x 17 100000 100000 24 Jul  6 18:23 subvol-123-disk-0
drwxr-xr-x 17 100000 100000 23 Jul  6 18:23 subvol-124-disk-0
drwxr-xr-x 17 root   root   26 Jul  6 18:23 subvol-126-disk-0
drwxr-xr-x  4   1000 media   4 Jun 28 15:20 subvol-126-disk-1
drwxr-xr-x 17 root   root   24 Jul  6 18:23 subvol-127-disk-0

Not sure why some are owned by root and others by 10000, but I think it has to do with privileged vs unprivileged. Shouldn't be a factor here.

Why can't I get this VM to migrate? I thought "volume 'rpool/data/vm-112-disk-0' already exists" might be a clue, but I don't actually see where it exists on either node.

Thanks

oscar.martinez · Jul 10, 2023

I have also the problem at the moment to migrate one VM from one server in the cluster to another takes a lot of time and begins to duplicate the disk several times, I have three nodes on my cluster

fiona · Jul 10, 2023

Hi,

LordRatner said:
When I look in /rpool/data I do not see anything related to VM-112. Though I do see some unrelated interesting things, such as ownership differences that I don't understand. Also it looks like when a replication job is deleted, it doesn't remove the last copy from /rpool/data

Code:

root@node2:/rpool/data# ls -l

this only shows the files in the ZFS filesystem. Use zfs list to also see the zvol devices. Those are used as virtual block devices for VMs. In particular, please share the output of zfs list -t all -r /rpool/data/vm-112-disk-0 as well as pveversion -v on both source and target node.

fiona · Jul 10, 2023

Hi,

oscar.martinez said:
I have also the problem at the moment to migrate one VM from one server in the cluster to another takes a lot of time and begins to duplicate the disk several times, I have three nodes on my cluster

that sounds like you might have two storages configured that point to the same backing storage? Please share the output of /etc/pve/storage.cfg and pveversion -v. While the particular issue for migration has been fixed in Proxmox VE 8 (migration no longer scans local storages for disks not in the configuration), it's still problematic to configure such a "duplicate" storage, because of locking, IDs not actually being unique, etc.

oscar.martinez · Jul 10, 2023

The reason for my cluster is to update/upgrade the nodes one by one. The last server I included in the cluster (server_3) has version 7.4-1. This way, I migrate the VMs from server_2 (version 6.4-1) to server_3 and upgrade the node (server_2) to the version 7.4-1. With this movement, I noticed that the migration create normally 3 copies for each HD from each VM. Once I upgraded the server_2 to the version 7.4-1, I put back the VMs and then the HDs multiplied, reaching 9 or 10 copies.

I removed the duplicate HDs using the method proposed in this thread:
Cannot remove image, a guest with VMID 'xxx' exists!

Initially, I would to ensure that all servers are on the same update/upgrade. At the time I checked, the version 7.4 was stable and version 8 was still in beta. Now, I'am a bit concerned about moving to v8. Once the upgrades are finished, I plan to make the important VMs highly available (HA).

Here the results for the 3 servers:

Those are the versions of the nodes


root@server_1 ~ # pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-11
pve-kernel-helper: 6.4-11
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2


root@server_2 ~ # pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-15 (running version: 7.4-15/a5d2a31e)
pve-kernel-5.15: 7.4-4
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1


root@server_3 ~ # pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-15 (running version: 7.4-15/a5d2a31e)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1

And those the configurations for the storages


root@server_1 ~ # cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,images,iso,rootdir,snippets
        prune-backups keep-all=1
        shared 0

dir: backup
        path /backup
        content backup
        prune-backups keep-all=1
        shared 1

lvmthin: server_1-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin:server_2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

pbs: Backup
        datastore backupDir
        server backup.server.info
        content backup
        nodes server_2,server_1
        prune-backups keep-all=1
        username root@pam

lvmthin: server_3-lvm
        thinpool data
        vgname vg0
        content images,rootdir


root@server_2 ~ # cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,images,iso,rootdir,snippets
        prune-backups keep-all=1
        shared 0

dir: backup
        path /backup
        content backup
        prune-backups keep-all=1
        shared 1

lvmthin: server_1-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

pbs: Backup
        datastore backupDir
        server backup.server.info
        content backup
        nodes server_2,server_1
        prune-backups keep-all=1
        username root@pam

lvmthin: server3-lvm
        thinpool data
        vgname vg0
        content images,rootdir


root@server_3 ~ # cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,images,iso,rootdir,snippets
        prune-backups keep-all=1
        shared 0

dir: backup
        path /backup
        content backup
        prune-backups keep-all=1
        shared 1

lvmthin: server_1-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

pbs: Backup
        datastore backupDir
        server backup.server.info
        content backup
        nodes server_2,server_1
        prune-backups keep-all=1
        username root@pam

lvmthin: server_3-lvm
        thinpool data
        vgname vg0
        content images,rootdir

fiona · Jul 11, 2023

oscar.martinez said:

Code:

lvmthin: server_1-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

lvmthin: server_3-lvm
        thinpool data
        vgname vg0
        content images,rootdir

The storage configuration in Proxmox VE is shared across nodes. You only need one entry in the storage configuration if the same local storage is available on all nodes. If a storage is not available on all nodes, you can restrict it with the nodes property. Since all three entries use the same backing storage (the data thinpool) and are not restricted to a given node, you have a triplicate storage and migration in Proxmox VE 7 gets confused.

You can either:

Use a single storage entry, but you'll need to adapt references to disks in the guest manually.
Restrict each storage to the specific node (but that is not clean, as it's supposed to be a single local storage configuration available on all three nodes).

LordRatner · Jul 12, 2023

fiona said:
Hi,

this only shows the files in the ZFS filesystem. Use zfs list to also see the zvol devices. Those are used as virtual block devices for VMs. In particular, please share the output of zfs list -t all -r /rpool/data/vm-112-disk-0 as well as pveversion -v on both source and target node.

Code:

root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 69.5G  1.45T     41.4G  -
rpool/data/vm-112-disk-0@pre_occ                         22.0G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__  5.12G      -     40.3G  -

Code:

root@node1:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-4
pve-kernel-5.19: 7.2-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.1
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

fiona · Jul 12, 2023

LordRatner said:

Code:

root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 69.5G  1.45T     41.4G  -
rpool/data/vm-112-disk-0@pre_occ                         22.0G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__  5.12G      -     40.3G  -

Is this the source or target of the migration? What about the other node?

LordRatner · Jul 12, 2023

fiona said:
Is this the source or target of the migration? What about the other node?

Code:

root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0  42.2G  1.36T     42.2G  -

Just delete that and it should work again?

What is the right way to remove the logical volume?

Thanks!

fiona · Jul 13, 2023

LordRatner said:
Code:

root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0 NAME USED AVAIL REFER MOUNTPOINT rpool/data/vm-112-disk-0 42.2G 1.36T 42.2G -

Just delete that and it should work again?

Yes. The question is how the image got here. Was the VM ever on that node before? Or maybe it's from a previous failed migration where it couldn't be cleaned up correctly. You can use zpool history | grep vm-112-disk-0 to get more info.

LordRatner said:
What is the right way to remove the logical volume?

You can remove the volume in the web UI by selecting the local-zfs storage on the target node in the VM Disks content tab or via the CLI with pvesm free local-zfs:vm-112-disk-0 (make sure you're on the correct node!).

LordRatner · Jul 13, 2023

fiona said:
Yes. The question is how the image got here. Was the VM ever on that node before? Or maybe it's from a previous failed migration where it couldn't be cleaned up correctly. You can use zpool history | grep vm-112-disk-0 to get more info.

You can remove the volume in the web UI by selecting the local-zfs storage on the target node in the VM Disks content tab or via the CLI with pvesm free local-zfs:vm-112-disk-0 (make sure you're on the correct node!).

That fixed it. Here's the history, I don't know how to interpret it but you can tell where the problem is fixed today after removing the volume on the other node

Node1, where the VM was a guest:

Code:

2023-07-12.22:17:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689218222__
2023-07-12.22:17:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689218222__
2023-07-12.22:47:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689220022__
2023-07-12.22:47:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689220022__
2023-07-12.23:17:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689221822__
2023-07-12.23:17:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689221822__
2023-07-12.23:47:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689223622__
2023-07-12.23:47:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689223622__
2023-07-13.00:17:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689225422__
2023-07-13.00:17:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689225422__
2023-07-13.00:47:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689227222__
2023-07-13.00:47:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689227222__
2023-07-13.01:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689229080__
2023-07-13.01:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689229080__
2023-07-13.01:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689230880__
2023-07-13.01:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689230880__
2023-07-13.02:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689232680__
2023-07-13.02:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689232680__
2023-07-13.02:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689234480__
2023-07-13.02:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689234480__
2023-07-13.03:18:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689236280__
2023-07-13.03:18:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689236280__
2023-07-13.03:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689238080__
2023-07-13.03:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689238080__
2023-07-13.04:18:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689239880__
2023-07-13.04:18:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689239880__
2023-07-13.04:48:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689241680__
2023-07-13.04:48:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689241680__
2023-07-13.05:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689243481__
2023-07-13.05:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689243481__
2023-07-13.05:48:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689245282__
2023-07-13.05:48:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689245282__
2023-07-13.06:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689247082__
2023-07-13.06:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689247082__
2023-07-13.06:48:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689248882__
2023-07-13.06:48:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689248882__
2023-07-13.07:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689250682__
2023-07-13.07:18:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689250682__
2023-07-13.07:48:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689252482__
2023-07-13.07:48:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689252482__
2023-07-13.08:18:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689254282__
2023-07-13.08:18:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689254282__
2023-07-13.08:48:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689256083__
2023-07-13.08:48:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689256083__
2023-07-13.09:08:16 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.09:11:14 zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.12:00:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
2023-07-13.12:00:23 zfs recv -F -- rpool/data/vm-112-disk-0
2023-07-13.12:00:24 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__

And node2, where the VM was supposed to migrate and couldn't:

Code:

2023-07-08.17:25:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688855103__
2023-07-08.17:25:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688855103__
2023-07-08.17:55:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688856903__
2023-07-08.17:55:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688856903__
2023-07-08.18:25:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688858703__
2023-07-08.18:25:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688858703__
2023-07-08.18:55:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688860503__
2023-07-08.18:55:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688860503__
2023-07-08.19:25:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688862303__
2023-07-08.19:25:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688862303__
2023-07-08.19:55:05 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688864103__
2023-07-08.19:55:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688864103__
2023-07-08.20:26:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688865960__
2023-07-08.20:26:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688865960__
2023-07-08.20:56:01 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688867760__
2023-07-08.20:56:02 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688867760__
2023-07-08.21:26:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688869561__
2023-07-08.21:26:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688869561__
2023-07-08.21:56:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688871361__
2023-07-08.21:56:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688871361__
2023-07-08.22:26:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688873161__
2023-07-08.22:26:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688873161__
2023-07-08.22:56:02 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688874961__
2023-07-08.22:56:03 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688874961__
2023-07-08.23:26:03 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688876762__
2023-07-08.23:26:04 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688876762__
2023-07-08.23:56:04 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1688878562__
2023-07-08.23:56:05 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1688878562__
2023-07-13.09:07:19 zfs destroy -r rpool/data/vm-112-disk-0
2023-07-13.09:11:15 zfs recv -F -- rpool/data/vm-112-disk-0
2023-07-13.09:11:16 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__
2023-07-13.12:00:06 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689267604__
2023-07-13.12:00:23 zfs send -Rpv -I __replicate_112-0_1689257295__ -- rpool/data/vm-112-disk-0@__replicate_112-0_1689267604__
2023-07-13.12:00:23 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__

Thanks!

fiona · Jul 14, 2023

LordRatner said:

Node1, where the VM was a guest:

Code:

2023-07-13.09:08:16 zfs snapshot rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.09:11:14 zfs send -Rpv -- rpool/data/vm-112-disk-0@__replicate_112-0_1689257295__
2023-07-13.12:00:06 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__

Replication does:

create a new snapshot, in this case __replicate_112-0_1689257295__.
replicate the delta to the previous replication snapshot incrementally or do a full send if no common snapshot can be found, in this case a full send.
remove the previous replication snapshot (if one exists), in this case __replicate_112-0_1681815600__.

LordRatner said:

Code:

root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0  42.2G  1.36T     42.2G  -

Because the target image didn't have any common snapshot (actually, none at all), it always tried to do a full send. It failed because the image was already present (we can't just autoremove, because we don't know what that image actually is, that's for the admin to decide).

So the question is why/when the __replicate_112-0_1681815600__ was deleted, note that the big number is the timestamp, so the snapshot should've been created

Code:

root@pve8a1 ~ # date -d@1681815600
Tue Apr 18 01:00:00 PM CEST 2023

LordRatner said:
And node2, where the VM was supposed to migrate and couldn't:

Code:

2023-07-13.09:07:19 zfs destroy -r rpool/data/vm-112-disk-0 2023-07-13.09:11:15 zfs recv -F -- rpool/data/vm-112-disk-0 2023-07-13.09:11:16 zfs destroy rpool/data/vm-112-disk-0@__replicate_112-0_1681815600__

Of course it was destroyed once on the new image after the successful full send to clean up, but that's not the one we're interested in. We want to know why it was destroyed on the old image.

You could search the history on the target node further up for that snapshot.

LordRatner · Jul 14, 2023

fiona said:
Replication does:

create a new snapshot, in this case __replicate_112-0_1689257295__.

replicate the delta to the previous replication snapshot incrementally or do a full send if no common snapshot can be found, in this case a full send.

remove the previous replication snapshot (if one exists), in this case __replicate_112-0_1681815600__.

Because the target image didn't have any common snapshot (actually, none at all), it always tried to do a full send. It failed because the image was already present (we can't just autoremove, because we don't know what that image actually is, that's for the admin to decide).

So the question is why/when the __replicate_112-0_1681815600__ was deleted, note that the big number is the timestamp, so the snapshot should've been created

Code:

root@pve8a1 ~ # date -d@1681815600 Tue Apr 18 01:00:00 PM CEST 2023

Of course it was destroyed once on the new image after the successful full send to clean up, but that's not the one we're interested in. We want to know why it was destroyed on the old image.

You could search the history on the target node further up for that snapshot.

I don't think I'm going to figure that out, based on how many things I've had to do to get this VM to work. But at least you fixed my replication/migration issue, so I thank you!

One last question:

Code:

root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.43T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__     0B      -     40.9G  -

Code:

root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0
NAME                                                      USED  AVAIL     REFER  MOUNTPOINT
rpool/data/vm-112-disk-0                                 64.0G  1.32T     40.9G  -
rpool/data/vm-112-disk-0@pre_occ                         23.1G      -     40.1G  -
rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__  6.12M      -     40.9G  -

That all looks normal. Except in the webui, there are no snapshots listed for this VM. I don't need the pre_occ snapshot and don't want to waste the space, but how/can I get rid of it if it isn't showing?

fiona · Jul 17, 2023

LordRatner said:
I don't think I'm going to figure that out, based on how many things I've had to do to get this VM to work. But at least you fixed my replication/migration issue, so I thank you!

One last question:

Code:

root@node1:~# zfs list -t all -r rpool/data/vm-112-disk-0 NAME USED AVAIL REFER MOUNTPOINT rpool/data/vm-112-disk-0 64.0G 1.43T 40.9G - rpool/data/vm-112-disk-0@pre_occ 23.1G - 40.1G - rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__ 0B - 40.9G -

Code:

root@node2:~# zfs list -t all -r rpool/data/vm-112-disk-0 NAME USED AVAIL REFER MOUNTPOINT rpool/data/vm-112-disk-0 64.0G 1.32T 40.9G - rpool/data/vm-112-disk-0@pre_occ 23.1G - 40.1G - rpool/data/vm-112-disk-0@__replicate_112-0_1689344161__ 6.12M - 40.9G -

That all looks normal. Except in the webui, there are no snapshots listed for this VM. I don't need the pre_occ snapshot and don't want to waste the space, but how/can I get rid of it if it isn't showing?

That means the snapshot does not exist in the VM configuration anymore, but only on the disks. You can remove it with zfs destroy rpool/data/vm-112-disk-0@pre_occ.

Search

Search

[SOLVED] VM Fails to migrate

LordRatner

Member

oscar.martinez

Member

fiona

Proxmox Staff Member

fiona

Proxmox Staff Member

oscar.martinez

Member

fiona

Proxmox Staff Member

LordRatner

Member

fiona

Proxmox Staff Member

LordRatner

Member

fiona

Proxmox Staff Member

LordRatner

Member

fiona

Proxmox Staff Member

LordRatner

Member

fiona

Proxmox Staff Member

We value your privacy