Previous upgrade seems to have broken online and offline migration with lvm

mir

Famous Member
Apr 14, 2012
3,568
129
133
Copenhagen, Denmark
Hi all,

The previous upgrade seems to have broken online and offline migration with shared lvm storage on iscsi. This is the upgrade:
Start-Date: 2020-02-11 01:24:26
Commandline: apt upgrade
Requested-By: mir (1000)
Install: pve-kernel-5.3.18-1-pve:amd64 (5.3.18-1, automatic)
Upgrade: pve-kernel-5.3:amd64 (6.1-3, 6.1-4), corosync:amd64 (3.0.2-pve4, 3.0.3-pve1), libcmap4:amd64 (3.0.2-pve4, 3.0.3-pve1), pve-firmware:amd64 (3.0-4, 3.0-5), libquorum5:amd64 (3.0.2-pve4, 3.0.3-pve1), libvotequorum8:amd64 (3.0.2-pve4, 3.0.3-pve1), libpve-common-perl:amd64 (6.0-11, 6.0-12), libcfg7:amd64 (3.0.2-pve4, 3.0.3-pve1), libknet1:amd64 (1.13-pve1, 1.14-pve1), pve-kernel-helper:amd64 (6.1-3, 6.1-4), libcpg4:amd64 (3.0.2-pve4, 3.0.3-pve1), libpve-apiclient-perl:amd64 (3.0-2, 3.0-3), libcorosync-common4:amd64 (3.0.2-pve4, 3.0.3-pve1)
End-Date: 2020-02-11 01:25:03

Last upgrade to to pve-kernel-5.3.18-2 has not fixed it. Moving the disk to another store type resolves the problem

The error is the following:
2020-02-23 00:16:34 starting migration of VM 153 to node 'esx1' (10.0.0.1)
2020-02-23 00:16:36 starting VM 153 on remote node 'esx1'
2020-02-23 00:16:38 [esx1] can't activate LV '/dev/qnap/vm-153-disk-0': device-mapper: create ioctl on qnap-vm--153--disk--0 LVM-RCevIXI8i5huDYro1QZ0fdlcZWWqYxm7DIe1JfkZcJK5iskK4TCa7rX8g5Kvwi3c failed: Device or resource busy
2020-02-23 00:16:38 ERROR: online migrate failure - remote command failed with exit code 255
2020-02-23 00:16:38 aborting phase 2 - cleanup resources
2020-02-23 00:16:38 migrate_cancel
2020-02-23 00:16:39 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

When VM runs as HA:
task started by HA resource agent
2020-02-21 23:24:23 starting migration of VM 153 to node 'esx1' (10.0.0.1)
2020-02-21 23:24:24 starting VM 153 on remote node 'esx1'
2020-02-21 23:24:26 [esx1] can't activate LV '/dev/qnap/vm-153-disk-0': device-mapper: create ioctl on qnap-vm--153--disk--0 LVM-RCevIXI8i5huDYro1QZ0fdlcZWWqYxm7DIe1JfkZcJK5iskK4TCa7rX8g5Kvwi3c failed: Device or resource busy
2020-02-21 23:24:26 ERROR: online migrate failure - remote command failed with exit code 255
2020-02-21 23:24:26 aborting phase 2 - cleanup resources
2020-02-21 23:24:26 migrate_cancel
2020-02-21 23:24:27 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems
 
Hi all,

The previous upgrade seems to have broken online and offline migration with shared lvm storage on iscsi. This is the upgrade:
Start-Date: 2020-02-11 01:24:26
Commandline: apt upgrade
Requested-By: mir (1000)
Install: pve-kernel-5.3.18-1-pve:amd64 (5.3.18-1, automatic)
Upgrade: pve-kernel-5.3:amd64 (6.1-3, 6.1-4), corosync:amd64 (3.0.2-pve4, 3.0.3-pve1), libcmap4:amd64 (3.0.2-pve4, 3.0.3-pve1), pve-firmware:amd64 (3.0-4, 3.0-5), libquorum5:amd64 (3.0.2-pve4, 3.0.3-pve1), libvotequorum8:amd64 (3.0.2-pve4, 3.0.3-pve1), libpve-common-perl:amd64 (6.0-11, 6.0-12), libcfg7:amd64 (3.0.2-pve4, 3.0.3-pve1), libknet1:amd64 (1.13-pve1, 1.14-pve1), pve-kernel-helper:amd64 (6.1-3, 6.1-4), libcpg4:amd64 (3.0.2-pve4, 3.0.3-pve1), libpve-apiclient-perl:amd64 (3.0-2, 3.0-3), libcorosync-common4:amd64 (3.0.2-pve4, 3.0.3-pve1)
End-Date: 2020-02-11 01:25:03

Last upgrade to to pve-kernel-5.3.18-2 has not fixed it. Moving the disk to another store type resolves the problem

The error is the following:
2020-02-23 00:16:34 starting migration of VM 153 to node 'esx1' (10.0.0.1)
2020-02-23 00:16:36 starting VM 153 on remote node 'esx1'
2020-02-23 00:16:38 [esx1] can't activate LV '/dev/qnap/vm-153-disk-0': device-mapper: create ioctl on qnap-vm--153--disk--0 LVM-RCevIXI8i5huDYro1QZ0fdlcZWWqYxm7DIe1JfkZcJK5iskK4TCa7rX8g5Kvwi3c failed: Device or resource busy
2020-02-23 00:16:38 ERROR: online migrate failure - remote command failed with exit code 255
2020-02-23 00:16:38 aborting phase 2 - cleanup resources
2020-02-23 00:16:38 migrate_cancel
2020-02-23 00:16:39 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

When VM runs as HA:
task started by HA resource agent
2020-02-21 23:24:23 starting migration of VM 153 to node 'esx1' (10.0.0.1)
2020-02-21 23:24:24 starting VM 153 on remote node 'esx1'
2020-02-21 23:24:26 [esx1] can't activate LV '/dev/qnap/vm-153-disk-0': device-mapper: create ioctl on qnap-vm--153--disk--0 LVM-RCevIXI8i5huDYro1QZ0fdlcZWWqYxm7DIe1JfkZcJK5iskK4TCa7rX8g5Kvwi3c failed: Device or resource busy
2020-02-21 23:24:26 ERROR: online migrate failure - remote command failed with exit code 255
2020-02-21 23:24:26 aborting phase 2 - cleanup resources
2020-02-21 23:24:26 migrate_cancel
2020-02-21 23:24:27 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

Just tried it with both kernel 18-1-pve and 18-2-pve as well as with/without HA and did not encounter any problems.


Maybe it depends on a specific (iscsi resp. lvm) configuration.

LVMs can be seen properly in all nodes?
 
Just tried it with both kernel 18-1-pve and 18-2-pve as well as with/without HA and did not encounter any problems.


Maybe it depends on a specific (iscsi resp. lvm) configuration.

LVMs can be seen properly in all nodes?
This specific setup has been working unchanged since Proxmox 1.9. Did you test with an LVM marked 'Shared'?

Code:
esx1:~# lvs |grep qnap
  vm-153-disk-0 qnap -wi-------  8.00g
esx2:~# lvs |grep qnap
  vm-153-disk-0 qnap -wi-ao----  8.00g

esx1:~# lvdisplay qnap
  --- Logical volume ---
  LV Path                /dev/qnap/vm-153-disk-0
  LV Name                vm-153-disk-0
  VG Name                qnap
  LV UUID                DIe1Jf-kZcJ-K5is-kK4T-Ca7r-X8g5-Kvwi3c
  LV Write Access        read/write
  LV Creation host, time esx2, 2020-02-21 22:49:27 +0100
  LV Status              NOT available
  LV Size                8.00 GiB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

esx2:~# lvdisplay qnap
  --- Logical volume ---
  LV Path                /dev/qnap/vm-153-disk-0
  LV Name                vm-153-disk-0
  VG Name                qnap
  LV UUID                DIe1Jf-kZcJ-K5is-kK4T-Ca7r-X8g5-Kvwi3c
  LV Write Access        read/write
  LV Creation host, time esx2, 2020-02-21 22:49:27 +0100
  LV Status              available
  # open                 1
  LV Size                8.00 GiB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

I just noticed that lvm2-lockd is not installed. Should it not be installed to provide locking mechanisms for accessing shared storage?
 
This specific setup has been working unchanged since Proxmox 1.9. Did you test with an LVM marked 'Shared'?

The respective Proxmox storage is marked as "Shared".
Code:
esx1:~# lvs |grep qnap
  vm-153-disk-0 qnap -wi-------  8.00g
esx2:~# lvs |grep qnap
  vm-153-disk-0 qnap -wi-ao----  8.00g
Does it work if you try to access directly in esx1? E.g.:
Code:
lvchange -ay qnap/vm-153-disk-0
etc.


I just noticed that lvm2-lockd is not installed. Should it not be installed to provide locking mechanisms for accessing shared storage?

Proxmox uses its own lock mechanism.
 
Does it work if you try to access directly in esx1? E.g.:
Code:
lvchange -ay qnap/vm-153-disk-0
No, same error:
lvchange -ay qnap/vm-153-disk-0
device-mapper: create ioctl on qnap-vm--153--disk--0 LVM-RCevIXI8i5huDYro1QZ0fdlcZWWqYxm7DIe1JfkZcJK5iskK4TCa7rX8g5Kvwi3c failed: Device or resource busy
 
Does the problem still occur if you boot an older kernel? Which was the last one that worked for you?
 
I cannot replicate it now. It was must likely a combination of a VM started with different versions of kernel and various packages which for some reason was not able to start again ;-(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!