[SOLVED] Random migration errors

Discussion in 'Proxmox VE: Installation and configuration' started by NdK73, Apr 16, 2019.

  1. NdK73

    NdK73 Member

    Joined:
    Jul 19, 2012
    Messages:
    69
    Likes Received:
    3
    Hello.

    After last upgrade, I sometimes get migration errors.
    I'm using a shared storage (Dell MD3200) system.

    This one seems a race condition:
    Code:
    2019-04-16 12:29:34 migration status: completed
    can't deactivate LV '/dev/DataBox1_r6/vm-116-disk-0': Logical volume DataBox1_r6/vm-116-disk-0 is used by another device.
    2019-04-16 12:29:37 ERROR: volume deactivation failed: DataBox1_r6:vm-116-disk-0 at /usr/share/perl5/PVE/Storage.pm line 1087.
    2019-04-16 12:29:38 ERROR: migration finished with problems (duration 00:00:40)
    TASK ERROR: migration problems
    At least it leaves the VM running on the target node.

    Other times I'm left with the VM running but locked on the source node and I have to use
    Code:
    qm unlock VMID
    wait a bit then I can
    Code:
    qm migrate VMID DEST --online
    and it usually works.

    I once even saw a "no quorum" message after migrating a machine with quite intense activity.

    Code:
    # pveversion -v
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
    pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
    pve-kernel-4.15: 5.3-3
    pve-kernel-4.15.18-12-pve: 4.15.18-35
    pve-kernel-4.15.18-10-pve: 4.15.18-32
    pve-kernel-4.15.18-7-pve: 4.15.18-27
    pve-kernel-4.15.18-1-pve: 4.15.18-19
    pve-kernel-4.15.17-1-pve: 4.15.17-9
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-8
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-50
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-41
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-25
    pve-cluster: 5.0-36
    pve-container: 2.0-37
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-19
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 2.12.1-3
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-50
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2
    
    Servers only have 2 * 1Gb network interfaces (the only PCIe slot is used by the HBA to access the MD3200). Network is configured as advanced-alb bond of the two interfaces, with VLANs for the different networks.

    Which other informations should I collect to better pin the issues and possibly have 'em resolved in next version?

    Tks,
    Diego
     
  2. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,119
    Likes Received:
    91
    hmm - after a quick search most results hint at some other dm-mapping for that device. On a quick guess - what is vm-116 and does it by any chance have a LVM-VG (for the guest inside)?

    please post the output of:
    * `pvs`
    * `vgs`
    * `lvs -a`
    * `dmsetup ls`

    if my assumption is correct it is probably fixed by adding the devicenodes beneath DataBox1_r6 to the global_blacklist in lvm.conf

    Is the MD3200 connected via SAS/FC/iSCSI? Do you use multipath?

    Hope this helps!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. NdK73

    NdK73 Member

    Joined:
    Jul 19, 2012
    Messages:
    69
    Likes Received:
    3
    Code:
    # pvs
      PV                             VG             Fmt  Attr PSize    PFree
      /dev/DataBox1_r6/vm-116-disk-0 Dati           lvm2 a--  1024.00g     0
      /dev/Ricerca/vm-104-disk-1     str957-cluster lvm2 a--    40.02t     0
      /dev/mapper/mp_MD3200_Ricerca  Ricerca        lvm2 a--    40.02t     0
      /dev/mapper/mp_MD3200_r6_0     DataBox1_r6    lvm2 a--    36.38t  1.04t
      /dev/mapper/mp_MD3800i         Databox2_r6    lvm2 a--    32.74t 28.60t
      /dev/sde3                      pve            lvm2 a--   297.84g 16.00g
    Code:
    # vgs
      VG             #PV #LV #SN Attr   VSize    VFree
      DataBox1_r6      1  29   0 wz--n-   36.38t  1.04t
      Databox2_r6      1   8   0 wz--n-   32.74t 28.60t
      Dati             1   1   0 wz--n- 1024.00g     0
      Ricerca          1   1   0 wz--n-   40.02t     0
      pve              1   3   0 wz--n-  297.84g 16.00g
      str957-cluster   1   1   0 wz--n-   40.02t     0
    Code:
    # lvs -a
      LV              VG             Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      vm-100-disk-1   DataBox1_r6    -wi-ao----   32.00g                                                   
      vm-100-disk-2   DataBox1_r6    -wi-ao----  512.00g                                                   
      vm-101-disk-1   DataBox1_r6    -wi-------   32.00g                                                   
      vm-101-disk-2   DataBox1_r6    -wi-------  512.00g                                                   
      vm-102-disk-1   DataBox1_r6    -wi-a-----  100.00g                                                   
      vm-102-disk-2   DataBox1_r6    -wi-a-----    1.00t                                                   
      vm-102-disk-3   DataBox1_r6    -wi-a-----    1.86t                                                   
      vm-102-disk-4   DataBox1_r6    -wi-a-----  500.00g                                                   
      vm-104-disk-1   DataBox1_r6    -wi-ao----   32.00g                                                   
      vm-104-disk-2   DataBox1_r6    -wi-ao----    5.00t                                                   
      vm-105-disk-1   DataBox1_r6    -wi-ao----   32.00g                                                   
      vm-105-disk-2   DataBox1_r6    -wi-ao----  500.00g                                                   
      vm-106-disk-1   DataBox1_r6    -wi-a-----  100.00g                                                   
      vm-107-disk-1   DataBox1_r6    -wi-a-----   50.00g                                                   
      vm-107-disk-2   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-108-disk-1   DataBox1_r6    -wi-a-----   50.00g                                                   
      vm-110-disk-1   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-110-disk-2   DataBox1_r6    -wi-a-----  400.00g                                                   
      vm-110-disk-3   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-115-disk-1   DataBox1_r6    -wi-ao----    4.88t                                                   
      vm-116-disk-0   DataBox1_r6    -wi-ao----    1.00t                                                   
      vm-116-disk-1   DataBox1_r6    -wi-ao----    1.00t                                                   
      vm-116-disk-2   DataBox1_r6    -wi-a-----    9.77t                                                   
      vm-120-disk-1   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-120-disk-2   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-125-disk-1   DataBox1_r6    -wi-a-----   50.00g                                                   
      vm-126-disk-1   DataBox1_r6    -wi-a-----   32.00g                                                   
      vm-127-disk-1   DataBox1_r6    -wi-ao----    1.95t                                                   
      vm-200-disk-1   DataBox1_r6    -wi-a-----    5.86t                                                   
      vm-103-disk-0   Databox2_r6    -wi-------   32.00g                                                   
      vm-103-disk-1   Databox2_r6    -wi-------   32.00g                                                   
      vm-103-disk-2   Databox2_r6    -wi-------    1.00t                                                   
      vm-109-disk-0   Databox2_r6    -wi-------   32.00g                                                   
      vm-109-disk-1   Databox2_r6    -wi-------    1.46t                                                   
      vm-113-disk-0   Databox2_r6    -wi-------    1.17t                                                   
      vm-113-disk-1   Databox2_r6    -wi-------  100.00g                                                   
      vm-131-disk-0   Databox2_r6    -wi-------  320.00g                                                   
      Cloud           Dati           -wi-a----- 1024.00g                                                   
      vm-104-disk-1   Ricerca        -wi-ao----   40.02t                                                   
      data            pve            twi-a-tz--  195.59g             0.00   0.05                           
      [data_tdata]    pve            Twi-ao----  195.59g                                                   
      [data_tmeta]    pve            ewi-ao----    2.00g                                                   
      [lvol0_pmspare] pve            ewi-------    2.00g                                                   
      root            pve            -wi-ao----   74.25g                                                   
      swap            pve            -wi-ao----    8.00g                                                   
      home            str957-cluster -wi-a-----   40.02t
    Code:
    # dmsetup ls
    DataBox1_r6-vm--102--disk--1   (253:8)
    DataBox1_r6-vm--116--disk--2   (253:24)
    Dati-Cloud   (253:37)
    DataBox1_r6-vm--116--disk--1   (253:22)
    mp_MD3800i   (253:39)
    DataBox1_r6-vm--100--disk--2   (253:6)
    pve-data_tdata   (253:35)
    DataBox1_r6-vm--116--disk--0   (253:33)
    DataBox1_r6-vm--115--disk--1   (253:26)
    DataBox1_r6-vm--100--disk--1   (253:5)
    DataBox1_r6-vm--120--disk--2   (253:29)
    pve-data_tmeta   (253:34)
    DataBox1_r6-vm--120--disk--1   (253:21)
    DataBox1_r6-vm--108--disk--1   (253:12)
    DataBox1_r6-vm--107--disk--2   (253:30)
    str957--cluster-home   (253:38)
    DataBox1_r6-vm--107--disk--1   (253:11)
    pve-swap   (253:2)
    pve-root   (253:3)
    DataBox1_r6-vm--200--disk--1   (253:27)
    pve-data   (253:36)
    Ricerca-vm--104--disk--1   (253:4)
    DataBox1_r6-vm--127--disk--1   (253:7)
    DataBox1_r6-vm--106--disk--1   (253:18)
    DataBox1_r6-vm--105--disk--2   (253:19)
    DataBox1_r6-vm--110--disk--3   (253:20)
    DataBox1_r6-vm--126--disk--1   (253:31)
    DataBox1_r6-vm--105--disk--1   (253:17)
    DataBox1_r6-vm--110--disk--2   (253:15)
    mp_MD3200_Ricerca   (253:1)
    DataBox1_r6-vm--104--disk--2   (253:16)
    mp_MD3200_r6_0   (253:0)
    DataBox1_r6-vm--102--disk--4   (253:28)
    DataBox1_r6-vm--125--disk--1   (253:23)
    DataBox1_r6-vm--110--disk--1   (253:13)
    DataBox1_r6-vm--104--disk--1   (253:10)
    DataBox1_r6-vm--102--disk--3   (253:32)
    DataBox1_r6-vm--102--disk--2   (253:14)
    You're probably right. And I'll have to do the same for the other shared storage (MD3800i, iSCSI).

    Via SAS and uses multipath.
     
  4. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,119
    Likes Received:
    91
    * Those two PVs from the `pvs` output are disks of guests - and should not be active on the PVE node itself.
    * I'm pretty sure that this is the root of the 'is used by another device' migration error (the quorum lost is probably a separate issue)

    * Please add all of the shared storages containing guest-images to the lvm.conf in the blacklist (global_filter - there's a quite descriptive comment above it) - and report back if the issue still persists

    Hope this helps!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. NdK73

    NdK73 Member

    Joined:
    Jul 19, 2012
    Messages:
    69
    Likes Received:
    3
    I'll have to do some more tests, but you're (quite certainly) right.
    I now added "r|/dev/mapper/.*-vm--[0-9]+--disk--[0-9]+|" to the global_filter line.
    It excludes all the contents of the disks created by Proxmox from host's LVM visibility. I think it should be safe enough to be included in the default lvm.conf shipped with Proxmox.

    Tks a lot !
     
  6. NdK73

    NdK73 Member

    Joined:
    Jul 19, 2012
    Messages:
    69
    Likes Received:
    3
    Confirmed: with the given filter there are no more migration errors due to failed deactivations.
    Anyway, a node reboot is required (unless you want to meddle with devices "wrongly" created by LVM)!
     
  7. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,119
    Likes Received:
    91
    Hmm - could you open an enhancement request at https://bugzilla.proxmox.com - so that we can discuss the potential implications - Thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. NdK73

    NdK73 Member

    Joined:
    Jul 19, 2012
    Messages:
    69
    Likes Received:
    3
    Done, tks.
    Bug 2184.
     
    Stoiko Ivanov likes this.
  9. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    536
    Likes Received:
    57
    balance-alb is an imperfect hammer, and can cause you grief the kind you describe. you'd get better results by either using the interfaces individually, OR make multiple active-backup bonds with alternating preferred masters for your various vlans. Of highest importance is that you leave a path as dedicated as you can to cluster traffic.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice