[SOLVED] Offline Migration failing

xmartx

New Member
Nov 28, 2018
9
0
1
47
Hello,

I'm running a PVE 5.3-9 cluster without any shared filesystem (ceph, nfs, etc) but just a single device in LVM on each node, actually it should be pretty much default config. I want to offline migrate a lxc container from one node to another and this is what I keep getting:

Code:
root@prox05:~# pct migrate 110 prox06
2019-03-18 16:01:25 starting migration of CT 110 to node 'prox06' (10.10.0.136)
2019-03-18 16:01:25 found local volume 'backup:vm-110-disk-0' (in current VM config)
2019-03-18 16:01:25 found local volume 'vg0:vm-110-disk-0' (via storage)
  Logical volume "vm-110-disk-0" created.
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 57.7679 s, 74.3 MB/s
62+131078 records in
62+131078 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 60.0081 s, 71.6 MB/s
volume vg0/vm-110-disk-0 already exists
command 'dd 'if=/dev/vg0/vm-110-disk-0' 'bs=64k'' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2019-03-18 16:02:26 ERROR: command 'set -o pipefail && pvesm export vg0:vm-110-disk-0 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=prox06' root@10.10.0.136 -- pvesm import vg0:vm-110-disk-0 raw+size - -with-snapshots 0' failed: exit code 255
2019-03-18 16:02:26 aborting phase 1 - cleanup resources
2019-03-18 16:02:26 ERROR: found stale volume copy 'backup:vm-110-disk-0' on node 'prox06'
2019-03-18 16:02:26 ERROR: found stale volume copy 'vg0:vm-110-disk-0' on node 'prox06'
2019-03-18 16:02:26 start final cleanup
2019-03-18 16:02:26 ERROR: migration aborted (duration 00:01:02): command 'set -o pipefail && pvesm export vg0:vm-110-disk-0 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=prox06' root@10.10.0.136 -- pvesm import vg0:vm-110-disk-0 raw+size - -with-snapshots 0' failed: exit code 255
migration aborted

The target node "prox06" was empty before the migration, so the already exists / found messages make no sense to me. It also makes no difference whether I use a LXC container or a VM, it's basically the same message.
I'll gladly provide further information, I'm thankful for any help
 
can you post your /etc/pve/storage.cfg and vgs and lvs output from each node ?
 
Hello Dominik,

yes of course. I have verified storage.cfg is the same on all nodes so I'm only posting it once:

Code:
dir: local
    path /var/lib/vz
    content images,rootdir,vztmpl,iso
    maxfiles 0
    shared 0

lvm: vg0
    vgname vg0
    content rootdir,images
    shared 0

lvm: backup
    vgname vg0
    content images,rootdir
    shared 0

Here is the lvs output of the two nodes I used in the example above. The "vm-110-disk-0" was not on prox06 before the migration command. The other nodes look similiar they just have some volumes from production machines.

Code:
root@prox05:~# lvs
  LV            VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root          vg0 -wi-ao---- 30,00g                                                  
  swap          vg0 -wi-ao---- 10,00g                                                  
  vm-110-disk-0 vg0 -wi-a-----  4,00g

Code:
root@prox06:~# lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root          vg0 -wi-ao----  15,00g                                                  
  swap          vg0 -wi-ao----   6,00g                                                  
  vm-110-disk-0 vg0 -wi-a-----   4,00g
 
you have two lvm storages defined, both with vg0 as volume group

when our code copies it from the source to the target, it already exists through the other storage

only use the volume group in one storage definition
 
Thank you for the quick and useful help, this explains it of course. However this leaves me with another issue: Is there a way to remove one of the storages without removing the data? We're not using the "backup" storage so it could be deleted, but it shows all the data that is in the "vg0" storage so I fear it might be deleted as well.

Fwiw, I have not configured this manually but this is what the (unofficial) image that is used by Hetzner's installimage results in when leaving pretty much everything at the default.
 
deleting a storage from /etc/pve/storage.cfg does not delete any data, but you may have now disk entries in your vm configs that still references that storage (this you have to manually cleanup)
 
I can check that easily, we don't have that many volumes and I'm already pretty sure we never used the "backup" storage.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!