Unable to migrate virtual machines on drbd-storage

dmp

New Member
Jun 8, 2016
9
0
1
32
Good morning from Germany,

we're using Proxmox in a simple three node cluster, where one node is just a quorom and control node (px1, px2, px3). On px1 and px2 we're using drbd9 as a shared storage and a LVM volume group as a local storage for non-roaming virtual machines. The VG used for DRBD9 has the same name as the local LVM-storage (VG1).

Our problem is that we're unable to migrate virtual machines saved on drbd-storage from one host to another until we either share or disable the LVM-storage in storage settings.

Code:
Jun 08 08:27:00 starting migration of VM 100 to node 'px2' (172.19.80.102)
Jun 08 08:27:00 copying disk images
Jun 08 08:27:01 ERROR: Failed to sync data - can't migrate 'vg1:vm-100-disk-1_00' - storage type 'lvm' not supported
Jun 08 08:27:01 aborting phase 1 - cleanup resources
Jun 08 08:27:01 ERROR: migration aborted (duration 00:00:01): Failed to sync data - can't migrate 'vg1:vm-100-disk-1_00' - storage type 'lvm' not supported
TASK ERROR: migration aborted

When we either share or disable the LVM-storage it works flawless.

Code:
Jun 08 08:28:02 starting migration of VM 100 to node 'px2' (172.19.80.102)
Jun 08 08:28:02 copying disk images
Jun 08 08:28:03 migration finished successfully (duration 00:00:01)
TASK OK

We don't think this is a normal behaviour even when there's the same volume group name twice.

pveversion -v
Code:
proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
drbdmanage: 0.95-1

/etc/pve/storage.cfg
Code:
dir: local
        path /var/lib/vz
        content rootdir,backup,images,vztmpl,iso
        maxfiles 0

drbd: drbd1
        content rootdir,images
        redundancy 2
        nodes px2,px1

lvm: vg1
        vgname VG1
        content images,rootdir
        shared
        nodes px1,px2

/etc/drbdmanaged.cfg
Code:
[GLOBAL]
storage-plugin = drbdmanage.storage.lvm.Lvm

[LOCAL]
drbdctrl-vg = VG1

Code:
root@px1:/etc# drbdmanage list-volumes
+------------------------------------------------------------------------------------------------------------+
| Name  | Vol ID |  Size | Minor |  | State |
|------------------------------------------------------------------------------------------------------------|
| vm-102-disk-1 |  0 | 409600 |  101 |  |  ok |
| vm-103-disk-1 |  0 | 409600 |  102 |  |  ok |
| vm-105-disk-1 |  0 |  32768 |  100 |  |  ok |
+------------------------------------------------------------------------------------------------------------+
root@px1:/etc# drbdmanage list-resources
+------------------------------------------------------------------------------------------------------------+
| Name  |  | State |
|------------------------------------------------------------------------------------------------------------|
| vm-102-disk-1 |  |  ok |
| vm-103-disk-1 |  |  ok |
| vm-105-disk-1 |  |  ok |
+------------------------------------------------------------------------------------------------------------+
root@px1:/etc# drbdmanage list-assignments
+------------------------------------------------------------------------------------------------------------+
| Node | Resource  | Vol ID |  | State |
|------------------------------------------------------------------------------------------------------------|
| px1  | vm-102-disk-1 |  * |  |  ok |
| px1  | vm-103-disk-1 |  * |  |  ok |
| px1  | vm-105-disk-1 |  * |  |  ok |
| px2  | vm-102-disk-1 |  * |  |  ok |
| px2  | vm-103-disk-1 |  * |  |  ok |
| px2  | vm-105-disk-1 |  * |  |  ok |
+------------------------------------------------------------------------------------------------------------+
 
The VM 100 is configured to use the LVM storage ("Jun 08 08:27:01 ERROR: Failed to sync data - can't migrate 'vg1:vm-100-disk-1_00' - storage type 'lvm' not supported"), so naturally DRBD is not used for this (as you can see in the drbdmanage output, where only the disks belonging to VMs with IDs 102, 103, 105 are shown). When you set the LVM storage to "shared", Proxmox believes you and does not copy anything (shared means that this storage is available on all nodes with the same content). BUT the LVM storage itself is not shared (only those volumes on it which are replicated by DRBD), so while the migration seems to work (as the shared disks are simply skipped), starting the VM on the target node won't (as the disk is not available there). Migrating the DRBD-managed VMs (102,103,105) should work as expected.

Do you need to use the volume group as plain LVM storage? If not, I would remove the "vg1" storage after moving all the disks to the "drbd1" storage. I would not recommend such a mixed setup in any case.

Also please note that you deviate from Proxmox's regular DRBD9 default by using plain LVM instead of LVM-thin. Furthermore, DRBD9 is only a technology preview in Proxmox, so please don't rely on it for production use.
 
Thanks for your answer, Fabian.

I'm so sorry that I pasted the wrong drbdmanage outputs. Disk vm-100-disk-1 ist not on LVM, though it's on DRBD. This was my fault and I'm going to get the correct drbdmanage output soon.

But even when the disk is on drbd (which it is), a migration to host px2 is not possible due to that error which makes no sense at all.

I think the problem is that the drbdpool has the same name as the big volume group (VG1) and not drbdpool. We know that this is not a normal setup, but it should be possible to migrate even with non-shared LVM-storage.
 
Your migrate output that I quoted ("Jun 08 08:27:01 ERROR: Failed to sync data - can't migrate 'vg1:vm-100-disk-1_00' - storage type 'lvm' not supported") explicitly says that the migration does not work because the disk is on LVM storage (which is not yet supported for migrate, the patches are currently discussed on pve-devel). I think you confused something here when setting up the VM, which was also the reason why I asked for the VM config (which will probably have 'vg1:vm-100-disk-1_00' as one of its disks). You need to select the 'drbd1' storage if you want the disks to be created via DRBD/drbdmanage.

Can you please post the configuration of a DRBD VM ("qm config ID") and the output of the migration for that VM. If you have changed any of the previous files you posted, please also post the updated versions.
 
Let's just use VM 105 for this example. As shown in the output above, the disk of VM 105 is on drbd - right?

So, when I try to migrate VM 105 from px1 to px2 it gives me this error:
Code:
Jun 08 13:27:34 starting migration of VM 105 to node 'px2' (172.19.80.102)
Jun 08 13:27:34 copying disk images
Jun 08 13:27:34 ERROR: Failed to sync data - unable to migrate 'testVG:vm-105-disk-1_00' to 'testVG:vm-105-disk-1_00' on host '172.19.80.102' - source type 'lvm' not implemented
Jun 08 13:27:34 aborting phase 1 - cleanup resources
Jun 08 13:27:34 ERROR: found stale volume copy 'testVG:vm-105-disk-1_00' on node 'px2'
Jun 08 13:27:34 ERROR: migration aborted (duration 00:00:00): Failed to sync data - unable to migrate 'testVG:vm-105-disk-1_00' to 'testVG:vm-105-disk-1_00' on host '172.19.80.102' - source type 'lvm' not implemented
TASK ERROR: migration aborted

Unfortunately "qm config 105" just gives me an error.
Code:
root@px1:/etc# qm config 105
"my" variable $volid masks earlier declaration in same statement at /usr/share/perl5/PVE/QemuMigrate.pm line 258.
"my" variable $parent masks earlier declaration in same statement at /usr/share/perl5/PVE/QemuMigrate.pm line 258.
"my" variable $parent masks earlier declaration in same statement at /usr/share/perl5/PVE/QemuMigrate.pm line 258.
syntax error at /usr/share/perl5/PVE/QemuMigrate.pm line 256, near ") {"
Global symbol "$volhash" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 262.
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 264.
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 265.
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 265.
syntax error at /usr/share/perl5/PVE/QemuMigrate.pm line 267, near "}"
syntax error at /usr/share/perl5/PVE/QemuMigrate.pm line 269, near "}"
Can't use global @_ in "my" at /usr/share/perl5/PVE/QemuMigrate.pm line 272, near "= @_"
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 274.
Global symbol "$vmid" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 274.
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 274.
Global symbol "$self" requires explicit package name at /usr/share/perl5/PVE/QemuMigrate.pm line 274.
syntax error at /usr/share/perl5/PVE/QemuMigrate.pm line 284, near "}"
/usr/share/perl5/PVE/QemuMigrate.pm has too many errors.
Compilation failed in require at /usr/share/perl5/PVE/API2/Qemu.pm line 18.
BEGIN failed--compilation aborted at /usr/share/perl5/PVE/API2/Qemu.pm line 18.
Compilation failed in require at /usr/share/perl5/PVE/CLI/qm.pm line 20.
BEGIN failed--compilation aborted at /usr/share/perl5/PVE/CLI/qm.pm line 20.
Compilation failed in require at /usr/sbin/qm line 6.
BEGIN failed--compilation aborted at /usr/sbin/qm line 6.

Here's the actual storage.cfg:
Code:
dir: local
        path /var/lib/vz
        maxfiles 0
        content images,rootdir,vztmpl,iso

drbd: drbd1
        nodes px2,px1
        content rootdir,images
        redundancy 2

lvm: vg1
        vgname VG1
        shared
        content rootdir,images

lvm: testVG
        vgname VG1
        content images,rootdir

testVG was just a test, the disk of VM105 is only on drbd1 - see attachment.

Sorry for the confusion.
 

Attachments

  • storage_105.PNG
    storage_105.PNG
    19.4 KB · Views: 6
I am not sure how you ended up where you did - did you manually edit the source code of proxmox modules or did you run into upgrade errors that you did not correct? the not-working "qm config" points to a rather broken installation..

anyhow: your migration log (again) clearly shows that your VM uses a disk on the storage "testVG", not "drbd1" (this is not something that we set/change anywhere unless explicitly asked for).
 
Code:
root@px1:/etc# qm config 105
bootdisk: ide0
cores: 1
ide0: drbd1:vm-105-disk-1,size=32G
ide2: none,media=cdrom
memory: 512
name: test2
net0: bridge=vmbr1,e1000=36:30:64:37:35:37
numa: 0
ostype: l26
smbios1: uuid=689e9200-06a2-4347-abe9-9a09e55176a2
sockets: 1

Fixed that - was indeed my fault, I had the /usr/share/perl5/PVE/QemuMigrate.pm opened in the background. Not sure why it throws this error then.

Whatever: as you can see, vm-105-disk-1 is definitely stored on drbd1, not on testVG and that's the main problem in this case: Proxmox however thinks it is stored on testVG, maybe because drbd1 and VG1/testVG are using the same volume group (VG1).
 
I have a suspicion where this originates from, but I will have to do some tests to confirm. Like I said, it is definitely not recommended to share VG in this way (and the usual caveat about DRBD9 still applies ;)).
 
Thanks! Glad you now understand the error as it wasn't easy to explain.

Of course it's not supported or recommended but this isn't a normal behaviour so it should be fixed, I guess. :)
 
I am pretty sure where this originates from, but just to get a complete picture: could you post the output of "lvs" and for each VM ID note whether it is configured to use the drbd storage or the lvm storage?

I think the fix for the mixup should be straightforward (but on the other hand I am still not convinced such a setup is a good idea, so if possible I would recommend to use separate volume groups as DRBD backing store and for direct LVM usage).
 
Sure!

Code:
root@px1:~# lvs
  LV               VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  .drbdctrl_0      VG1  -wi-ao----   4.00m
  .drbdctrl_1      VG1  -wi-ao----   4.00m
  SYSTEM           VG1  -wi-ao----  93.13g
  vm-100-disk-1    VG1  -wi-a----- 400.00g
  vm-102-disk-1_00 VG1  -wi-ao---- 400.09g
  vm-103-disk-1_00 VG1  -wi-ao---- 400.09g

VM 100 is configured to use LVM.
VM 102 is configured to use DRBD.
VM 103 is configured to use DRBD.

Yup, I'm sure it's not the best approach in this case, but for us it's the only one with which we can work. Long story... ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!