New cluster, migration between nodes fails

CRCinAU

Active Member
May 4, 2020
120
35
33
crc.id.au
Hi all,

I've just created a new Cluster with 2 nodes, and am trying to migrate running CT's from one system to the other. Upon trying the migration, I get:

Code:
2020-09-23 15:11:52 shutdown CT 100
2020-09-23 15:11:55 starting migration of CT 100 to node 'cly-pm-1' (192.168.51.1)
2020-09-23 15:11:55 found local volume 'vm-storage:vm-100-disk-0' (in current VM config)
2020-09-23 15:11:56 blockdev: cannot open /dev/vm-storage/vm-100-disk-0: No such file or directory
2020-09-23 15:11:56 command '/sbin/blockdev --getsize64 /dev/vm-storage/vm-100-disk-0' failed: exit code 1
2020-09-23 15:11:56 import: no size found in export header, aborting.
send/receive failed, cleaning up snapshot(s)..
2020-09-23 15:11:56 ERROR: storage migration for 'vm-storage:vm-100-disk-0' to storage 'vm-storage' failed - command 'set -o pipefail && pvesm export vm-storage:vm-100-disk-0 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cly-pm-1' root@192.168.51.1 -- pvesm import vm-storage:vm-100-disk-0 raw+size - -with-snapshots 0 -allow-rename 0' failed: exit code 255
2020-09-23 15:11:56 aborting phase 1 - cleanup resources
2020-09-23 15:11:56 ERROR: found stale volume copy 'vm-storage:vm-100-disk-0' on node 'cly-pm-1'
2020-09-23 15:11:56 start final cleanup
2020-09-23 15:11:56 start container on source node
2020-09-23 15:11:57 ERROR: migration aborted (duration 00:00:05): storage migration for 'vm-storage:vm-100-disk-0' to storage 'vm-storage' failed - command 'set -o pipefail && pvesm export vm-storage:vm-100-disk-0 raw+size - -with-snapshots 0 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cly-pm-1' root@192.168.51.1 -- pvesm import vm-storage:vm-100-disk-0 raw+size - -with-snapshots 0 -allow-rename 0' failed: exit code 255
TASK ERROR: migration aborted

Looking at /etc/pve/storage.cfg:
Code:
dir: local 
    path /var/lib/vz 
    content backup,vztmpl,iso 
 
lvm: vm-storage 
    vgname vm-storage 
    content rootdir,images 
    shared 0

vgdisplay on source:
Code:
  --- Volume group --- 
  VG Name               vm-storage 
  System ID              
  Format                lvm2 
  Metadata Areas        1 
  Metadata Sequence No  7 
  VG Access             read/write 
  VG Status             resizable 
  MAX LV                0 
  Cur LV                4 
  Open LV               4 
  Max PV                0 
  Cur PV                1 
  Act PV                1 
  VG Size               192.88 GiB 
  PE Size               4.00 MiB 
  Total PE              49378 
  Alloc PE / Size       43520 / 170.00 GiB 
  Free  PE / Size       5858 / 22.88 GiB 
  VG UUID               waJYdl-Ckox-UraJ-3cta-eev1-ih36-jENXiQ 
    
  --- Volume group --- 
  VG Name               pve 
  System ID              
  Format                lvm2 
  Metadata Areas        1 
  Metadata Sequence No  9 
  VG Access             read/write 
  VG Status             resizable 
  MAX LV                0 
  Cur LV                2 
  Open LV               2 
  Max PV                0 
  Cur PV                1 
  Act PV                1 
  VG Size               <39.50 GiB 
  PE Size               4.00 MiB 
  Total PE              10111 
  Alloc PE / Size       10111 / <39.50 GiB 
  Free  PE / Size       0 / 0    
  VG UUID               ML3WLS-4tOr-eOjy-MJfE-pIDd-mPxW-WzAw5c

vgdisplay on target:
Code:
  --- Volume group --- 
  VG Name               pve 
  System ID              
  Format                lvm2 
  Metadata Areas        1 
  Metadata Sequence No  10 
  VG Access             read/write 
  VG Status             resizable 
  MAX LV                0 
  Cur LV                2 
  Open LV               2 
  Max PV                0 
  Cur PV                1 
  Act PV                1 
  VG Size               <59.50 GiB 
  PE Size               4.00 MiB 
  Total PE              15231 
  Alloc PE / Size       15231 / <59.50 GiB 
  Free  PE / Size       0 / 0    
  VG UUID               P8M2VG-I4oq-4Zyq-eLfb-k23K-nO27-xiE21q 
    
  --- Volume group --- 
  VG Name               vm-storage 
  System ID              
  Format                lvm2 
  Metadata Areas        1 
  Metadata Sequence No  6 
  VG Access             read/write 
  VG Status             resizable 
  MAX LV                0 
  Cur LV                0 
  Open LV               0 
  Max PV                0 
  Cur PV                1 
  Act PV                1 
  VG Size               <3.64 TiB 
  PE Size               4.00 MiB 
  Total PE              953829 
  Alloc PE / Size       0 / 0    
  Free  PE / Size       953829 / <3.64 TiB 
  VG UUID               VJvyQa-soX8-TKKz-TzX6-KA2K-JlLt-fmaPyS

Running the failing command manually works fine on the source system:
Code:
# /sbin/blockdev --getsize64 /dev/vm-storage/vm-100-disk-0 
21474836480

Anyone come across this before? I'm at a bit of a loss...
 
hi,

is the cluster status healthy? can you check pvecm status

can you also check if you can ssh from one node to the other without interaction? ssh nodename or ssh IP should log in directly
 
As best I can tell, everything seems fine:

Code:
root@cly-pm-1:~# pvecm status 
Cluster information 
------------------- 
Name:             proxmoxcluster 
Config Version:   2 
Transport:        knet 
Secure auth:      on 
 
Quorum information 
------------------ 
Date:             Wed Sep 23 19:55:42 2020 
Quorum provider:  corosync_votequorum 
Nodes:            2 
Node ID:          0x00000002 
Ring ID:          1.24 
Quorate:          Yes 
 
Votequorum information 
---------------------- 
Expected votes:   2 
Highest expected: 2 
Total votes:      2 
Quorum:           2   
Flags:            Quorate  
 
Membership information 
---------------------- 
    Nodeid      Votes Name 
0x00000001          1 192.168.51.2 
0x00000002          1 192.168.51.1 (local)

SSH from node id 1 -> 2:
Code:
root@cly-pm-1:~# ssh 192.168.51.2 
Linux cly-pm-tmp 5.4.60-1-pve #1 SMP PVE 5.4.60-2 (Fri, 04 Sep 2020 10:24:50 +0200) x86_64 
 
The programs included with the Debian GNU/Linux system are free software; 
the exact distribution terms for each program are described in the 
individual files in /usr/share/doc/*/copyright. 
 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Wed Sep 23 14:59:55 2020 from 192.168.13.1 
root@cly-pm-tmp:~#

SSH from 2 -> 1:
Code:
root@cly-pm-tmp:~# ssh 192.168.51.1 
Linux cly-pm-1 5.4.60-1-pve #1 SMP PVE 5.4.60-2 (Fri, 04 Sep 2020 10:24:50 +0200) x86_64 
 
The programs included with the Debian GNU/Linux system are free software; 
the exact distribution terms for each program are described in the 
individual files in /usr/share/doc/*/copyright. 
 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Wed Sep 23 19:57:05 2020 from 192.168.51.1 
root@cly-pm-1:~#

Package versions if it helps:
Code:
proxmox-ve: 6.2-2 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.2-1
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.1.0-2
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1
 
could you check the output of lvs -a and lvscan -a on both nodes
 
Node ID: 1
Code:
root@cly-pm-tmp:~# lvs -a
  LV            VG         Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root          pve        -wi-ao---- 34.62g                                                    
  swap          pve        -wi-ao---- <4.88g                                                    
  vm-100-disk-0 vm-storage -wi-ao---- 20.00g                                                    
  vm-101-disk-0 vm-storage -wi-ao---- 20.00g                                                    
  vm-102-disk-0 vm-storage -wi-ao---- 40.00g                                                    
  vm-103-disk-0 vm-storage -wi-ao---- 90.00g                                                    
root@cly-pm-tmp:~# lvscan -a
  ACTIVE            '/dev/vm-storage/vm-100-disk-0' [20.00 GiB] inherit
  ACTIVE            '/dev/vm-storage/vm-101-disk-0' [20.00 GiB] inherit
  ACTIVE            '/dev/vm-storage/vm-102-disk-0' [40.00 GiB] inherit
  ACTIVE            '/dev/vm-storage/vm-103-disk-0' [90.00 GiB] inherit
  ACTIVE            '/dev/pve/swap' [<4.88 GiB] inherit
  ACTIVE            '/dev/pve/root' [34.62 GiB] inherit
root@cly-pm-tmp:~#

Node ID 2:
Code:
root@cly-pm-1:~# lvs -a
  LV   VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root pve -wi-ao---- 52.12g                                                    
  swap pve -wi-ao---- <7.38g                                                    
root@cly-pm-1:~# lvscan -a
  ACTIVE            '/dev/pve/swap' [<7.38 GiB] inherit
  ACTIVE            '/dev/pve/root' [52.12 GiB] inherit
root@cly-pm-1:~#

On Node ID 2, the vm-storage VG/PV is empty as there's no VMs on it:
Code:
  --- Volume group --- 
  VG Name               vm-storage 
  System ID              
  Format                lvm2 
  Metadata Areas        1 
  Metadata Sequence No  6 
  VG Access             read/write 
  VG Status             resizable 
  MAX LV                0 
  Cur LV                0 
  Open LV               0 
  Max PV                0 
  Cur PV                1 
  Act PV                1 
  VG Size               <3.64 TiB 
  PE Size               4.00 MiB 
  Total PE              953829 
  Alloc PE / Size       0 / 0    
  Free  PE / Size       953829 / <3.64 TiB 
  VG UUID               VJvyQa-soX8-TKKz-TzX6-KA2K-JlLt-fmaPyS
 
can you repeat after turning off the container? maybe something de-activates the LV..
 
Ok - stopped CT 100 on Node ID 1:
Code:
root@cly-pm-tmp:~# lvs -a 
  LV            VG         Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert 
  root          pve        -wi-ao---- 34.62g                                                     
  swap          pve        -wi-ao---- <4.88g                                                     
  vm-100-disk-0 vm-storage -wi------- 20.00g                                                     
  vm-101-disk-0 vm-storage -wi-ao---- 20.00g                                                     
  vm-102-disk-0 vm-storage -wi-ao---- 40.00g                                                     
  vm-103-disk-0 vm-storage -wi-ao---- 90.00g                                                     
root@cly-pm-tmp:~# lvscan -a 
  inactive          '/dev/vm-storage/vm-100-disk-0' [20.00 GiB] inherit 
  ACTIVE            '/dev/vm-storage/vm-101-disk-0' [20.00 GiB] inherit 
  ACTIVE            '/dev/vm-storage/vm-102-disk-0' [40.00 GiB] inherit 
  ACTIVE            '/dev/vm-storage/vm-103-disk-0' [90.00 GiB] inherit 
  ACTIVE            '/dev/pve/swap' [<4.88 GiB] inherit 
  ACTIVE            '/dev/pve/root' [34.62 GiB] inherit 
root@cly-pm-tmp:~#
 
@fabian To me, this seems like it might be a proxmox bug - as the LV does become inactive when the CT is shut down.

This means that the /dev/vm-storage/vm-100-disk-0 symlink disappears - which could well explain why blockdev can't examine it to get the size.

Interestingly, I've noticed this with a lot of PM uses - that the LVs are marked inactive when shutdown. Not a bad thing, but in this case, it breaks migration.
 
yes, please file a bug at https://bugzilla.proxmox.com - there seems to be one deactivate_volume too many (or one activate_volume missing). for LVM, we need to deactivate volumes because in some shared deployments they must not be active on two cluster nodes at the same time, otherwise they might get corrupted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!