[SOLVED] VMs offline migration fails

Matvey

Active Member
Mar 15, 2018
2
0
41
40
Hi Guys,

I have 2 nodes (prox-01, prox-02) configured in cluster:
Quorum information
------------------
Date: Thu Mar 15 09:35:20 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/40
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 91.226.124.4 (local)
0x00000002 1 91.226.124.25


Each server has local non-shared storage (prox-01 -> local-lvm-01, prox-02 -> local-lvm-02)

lvmthin: local-lvm-01
thinpool pvepool
vgname storage
content rootdir,images
nodes prox-01,prox-02

nfs: bkp-nfs-01
disable
export /var/backup_prox
path /mnt/pve/bkp-nfs-01
server 10.10.124.24
content backup
maxfiles 5
options vers=3

lvmthin: local-lvm-02
thinpool pvepool
vgname storage
content rootdir,images
nodes prox-01,prox-02


When I'm trying to migrate (Migrate button) stopped VM I get error:

2018-03-15 09:05:21 starting migration of VM 7045 to node 'prox-02' (80.115.13.25)
2018-03-15 09:05:21 found local disk 'local-lvm-01:vm-7045-disk-1' (in current VM config)
2018-03-15 09:05:21 found local disk 'local-lvm-02:vm-7045-disk-1' (via storage)
2018-03-15 09:05:21 copying disk images
Using default stripesize 64.00 KiB.
Logical volume "vm-7045-disk-1" created.
491520+0 records in
491520+0 records out
32212254720 bytes (32 GB, 30 GiB) copied, 278.411 s, 116 MB/s
37+1933393 records in
37+1933393 records out
32212254720 bytes (32 GB, 30 GiB) copied, 280.156 s, 115 MB/s
volume storage/vm-7045-disk-1 already exists
command 'dd 'if=/dev/storage/vm-7045-disk-1' 'bs=64k'' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-03-15 09:10:02 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export local-lvm-02:vm-7045-disk-1 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=prox-02' root@91.226.124.25 -- pvesm import local-lvm-02:vm-7045-disk-1 raw+size - -with-snapshots 0' failed: exit code 255
2018-03-15 09:10:02 aborting phase 1 - cleanup resources
2018-03-15 09:10:02 ERROR: found stale volume copy 'local-lvm-01:vm-7045-disk-1' on node 'prox-02'
2018-03-15 09:10:02 ERROR: found stale volume copy 'local-lvm-02:vm-7045-disk-1' on node 'prox-02'
2018-03-15 09:10:02 ERROR: migration aborted (duration 00:04:42): Failed to sync data - command 'set -o pipefail && pvesm export local-lvm-02:vm-7045-disk-1 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=prox-02' root@80.115.13.25 -- pvesm import local-lvm-02:vm-7045-disk-1 raw+size - -with-snapshots 0' failed: exit code 255
TASK ERROR: migration aborted


I have 2 questions:
  1. How I can fix migration?
  2. Why migration using public network instead of private?
Thank you.
 
How I can fix migration?
Use the same storage name in you config. "local-lvm"
Storage's are culster wide and not node specific.
You can restrict storages to nodes but if you have the same storage on different nodes you should use the same name.

Why migration using public network instead of private?
You can set the migration network in the datacenter.conf.
 
  • Like
Reactions: Matvey