Proxmox remember somewhere previous local drives. Migration fails

mainprog

New Member
Sep 6, 2013
18
0
1
Hello All,

I am having issues with migration after I move drives to shared disk. That issue appears only with new proxmox clusters.

Here issue is:
Before VM disks were on vmbox_lvm-local.
Moved them to shared iscsi drive.

root@vmbox:~# grep disk /etc/pve/nodes/vmbox/qemu-server/10085.conf
bootdisk: sata0
sata0: vmbox-lvm-on-scsi:vm-10085-disk-2,size=60G
sata1: vmbox-lvm-on-scsi:vm-10085-disk-3,size=140G

There is no any reference to old vmbox_lvm-local disk.
But migration fails.
If I move config manually to other node then it starts successfully. So it is not on config layer.


Jul 01 10:10:51 starting migration of VM 10085 to node 'c1-krrblade1' (10.10.0.56)
Jul 01 10:10:51 copying disk images
Jul 01 10:10:52 ERROR: Failed to sync data - storage 'vmbox_lvm-local' is not available on node 'c1-krrblade1'
Jul 01 10:10:52 aborting phase 1 - cleanup resources
Jul 01 10:10:52 ERROR: migration aborted (duration 00:00:01): Failed to sync data - storage 'vmbox_lvm-local' is not available on node 'c1-krrblade1'
TASK ERROR: migration aborted




root@vmbox:~# pveversion -v
proxmox-ve: 4.2-54 (running kernel: 4.4.10-1-pve)
pve-manager: 4.2-15 (running version: 4.2-15/6669ad2c)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-83
pve-firmware: 1.1-8
libpve-common-perl: 4.0-70
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-70
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80

 
For sure it is some node spesific cache or something.

As I tested following.
Moved config manually to c1-krrblade1 node.
Started there with no issues.
Migrated back to vmbox node on UI with no issues.

But when tried migrate again to c1-krrblade1 it failed with same error.


Bug??
 
Did you delete the source disks/images after moving? Could they still be referenced in snapshots? There is an upcoming update that will be more verbose for this, but on migration we check all disks on all storages owned by this VM (e.g., old currently unused disks that were not deleted) and all disks referenced in the configuration, including snapshots!
 
Yes those were deleted.
I did moved disks with deletion enabled.
No any reference to old disk in config,

As you see I've posted grep result from config.

I am suing raw LVM as you can see. So no snapshots.
 
what does "pvesm list vmbox_lvm-local" output?
 
Sure it say not available on blade1/2.

And on vmbox it shows list of vms on it.
But thing is that that same LVM storage is shared thru ISCSI to other nodes.
And VM 10085 is actually configured not with vmbox_lvm-local but with vmbox-lvm-on-scsi.
So during migration it looks like logic of checking which storage used by VM is not accurate.


root@vmbox:~# pvesm list vmbox_lvm-local
vmbox_lvm-local:vm-10019-disk-1 raw 5368709120 10019
vmbox_lvm-local:vm-10085-disk-2 raw 64424509440 10085
vmbox_lvm-local:vm-10085-disk-3 raw 150323855360 10085
vmbox_lvm-local:vm-10102-disk-1 raw 10737418240 10102
vmbox_lvm-local:vm-2023-disk-1 raw 17179869184 2023
 
Sure it say not available on blade1/2.

And on vmbox it shows list of vms on it.
But thing is that that same LVM storage is shared thru ISCSI to other nodes.
And VM 10085 is actually configured not with vmbox_lvm-local but with vmbox-lvm-on-scsi.
So during migration it looks like logic of checking which storage used by VM is not accurate.

if you use the same VG twice (once as local storage, once shared over iSCSI) Proxmox does not know about it and treats the local one as local. Could you post the output of "lvs --separator ':' --noheadings --units b --unbuffered --nosuffix --options 'vg_name,lv_name,tags,attr'"?
 
LVM layout is not an issue here.
I've installed multipath-tools so user space tools wil not be confused by double PV
That is why I've renamed VG on each node to own name instead just pve. So nodes will not confused by multiple pve VG.

But thing is that VM is running thru iscsi storage.
And migration failing not on any LVM related issue. But on reporting about storage which not actually used by VM. So it does not care about VM configs but should.
So why it can not successfully find disk and proceed with migration?


lvs --separator ':' --noheadings --units b --unbuffered --nosuffix --options 'vg_name,lv_name,tags,attr'
pve_vmbox:swap::-wi-ao----
pve_vmbox:root::-wi-ao----
pve_vmbox:data::-wi-ao----
pve_vmbox:vm-10085-disk-2:pve-vm-10085:-wi-ao----
pve_vmbox:vm-10085-disk-3:pve-vm-10085:-wi-ao----
 
You have an LVM storage configured for the volume group "pve_vmbox". This volume group contains logical volumes that are named according the PVE naming conventions. Therefore, PVE (correctly) thinks that the VM 10085 has disks on this storage. This is not a bug in PVE, but a consequence of your setup. If you don't want PVE to "see" those volumes (and associate them with the VMs), then don't configure a LVM storage using this volume group.
 
But for sure the error 'storage 'vmbox_lvm-local' is not available on node 'c1-krrblade1'' confusing. Because VM does not use that storage directly.
I understand that it is more related to my specific setup.
As I have to use somehow local storage of VMbox on other nodes. So that is why it is shared thru iscsi for other nodes.

Looks like only workaround for me doing manual migration. Stop VM. Move it's config to required node and start it.