Migration from ESXI to proxmox: Shared LVM issues

notworking

New Member
Jun 20, 2024
4
0
1
Good morning.

In a recent migration project we moved from esxi to a 3 node-proxmox cluster connected with a HPE MSA SAN via fiberchannel. (with 2 LUNs)
We did it as follows:
On all Move all VMs to one esxi server (esxi1) and to LUN A on the MSA
Clear LUN b
install proxmox on the other 2 servers (esxi2 and esxi3)
Setup multipath (esxi2 and esxi3)
create PV, VG (on LUN b (/dev/mapper/...) (only on esxi2)
Create cluster (on esxi2) and join cluster (on esxi3)
In proxmox -> datacenter -> Storage -> add -> LVM (checked enable, share, wipe removed volumes) on the newly created VG (LUN B)
---
This seemed to work, we could see the storage on both nodes, could migrate between the 2 and 3 and could setup/remove VMs (removing took a while with zeroing but that's fine)
We restored some backups of esxi1 to the proxmox cluster and with the required drives all VMs work
---
next we did the following:
Got all VMs running on the 2 servers
Took last backup of esxi1
Clear LUN a from esxi1
Create PV and VG (on LUN a) (also only on esxi2)
Go to datacenter -> storage-> add->LVM (also checked enable, share, wiped removed volumes) on the newly created VG (LUN a)
---
This is where it went wrong, Proxmox no longer "sees" the first LVM, and cannot create/migrate/start VM's on either node
All VM's are still running and are writing data to the LVM (we think)
We can still take backups of the VMs on the first LVM using VEAAM

Anyone that has any idea what went wrong?

view of storage on esxi2

View of storage on esxi3
1755069719426.png
Command outputs:
Code:
# pvesm status
  Command failed with status code 5.
command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Name             Type     Status           Total            Used       Available        %
local             dir     active        98497780         4635104        88813128    4.71%
local-lvm     lvmthin     active       449970176        51836564       398133611   11.52%
msa_fast          lvm     active      3906244608               0      3906244608    0.00%
msa_slow          lvm   inactive               0               0               0    0.00%
Code:
# lsblk
NAME                           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                              8:0    0 558.9G  0 disk
├─sda1                           8:1    0  1007K  0 part
├─sda2                           8:2    0     1G  0 part  /boot/efi
└─sda3                           8:3    0 557.9G  0 part
  ├─pve-swap                   252:0    0     8G  0 lvm   [SWAP]
  ├─pve-root                   252:1    0    96G  0 lvm   /
  ├─pve-data_tmeta             252:2    0   4.4G  0 lvm
  │ └─pve-data-tpool           252:4    0 429.1G  0 lvm
  │   ├─pve-data               252:5    0 429.1G  1 lvm
  │   ├─pve-vm--100--disk--0   252:6    0     4M  0 lvm
  │   ├─pve-vm--100--cloudinit 252:7    0     4M  0 lvm
  │   ├─pve-vm--100--disk--1   252:8    0   100G  0 lvm
  │   ├─pve-vm--121--disk--0   252:31   0     4M  0 lvm
  │   └─pve-vm--121--disk--1   252:32   0    90G  0 lvm
  └─pve-data_tdata             252:3    0 429.1G  0 lvm
    └─pve-data-tpool           252:4    0 429.1G  0 lvm
      ├─pve-data               252:5    0 429.1G  1 lvm
      ├─pve-vm--100--disk--0   252:6    0     4M  0 lvm
      ├─pve-vm--100--cloudinit 252:7    0     4M  0 lvm
      ├─pve-vm--100--disk--1   252:8    0   100G  0 lvm
      ├─pve-vm--121--disk--0   252:31   0     4M  0 lvm
      └─pve-vm--121--disk--1   252:32   0    90G  0 lvm
sdb                              8:16   1     0B  0 disk
sdc                              8:32   0   7.7T  0 disk
└─mpatha                       252:9    0   7.7T  0 mpath
  ├─slow_vg-vm--115--disk--1   252:11   0    90G  0 lvm
  ├─slow_vg-vm--115--disk--0   252:12   0     4M  0 lvm
  ├─slow_vg-vm--112--disk--1   252:13   0    80G  0 lvm
  ├─slow_vg-vm--106--disk--1   252:14   0   150G  0 lvm
  ├─slow_vg-vm--106--disk--0   252:15   0     4M  0 lvm
  ├─slow_vg-vm--112--disk--0   252:16   0     4M  0 lvm
  ├─slow_vg-vm--107--disk--1   252:17   0    60G  0 lvm
  ├─slow_vg-vm--107--disk--0   252:18   0     4M  0 lvm
  ├─slow_vg-vm--116--disk--1   252:19   0    60G  0 lvm
  ├─slow_vg-vm--116--disk--0   252:20   0     4M  0 lvm
  ├─slow_vg-vm--118--disk--0   252:21   0    30G  0 lvm
  ├─slow_vg-vm--110--disk--1   252:22   0   120G  0 lvm
  ├─slow_vg-vm--110--disk--2   252:23   0    60G  0 lvm
  ├─slow_vg-vm--110--disk--0   252:24   0     4M  0 lvm
  ├─slow_vg-vm--111--disk--1   252:25   0    60G  0 lvm
  ├─slow_vg-vm--111--disk--0   252:26   0     4M  0 lvm
  ├─slow_vg-vm--117--disk--0   252:27   0    60G  0 lvm
  ├─slow_vg-vm--119--disk--0   252:28   0    45G  0 lvm
  ├─slow_vg-vm--120--disk--1   252:29   0    60G  0 lvm
  └─slow_vg-vm--120--disk--0   252:30   0     4M  0 lvm
sdd                              8:48   0   3.6T  0 disk
└─mpathb                       252:10   0   3.6T  0 mpath
sde                              8:64   0   7.7T  0 disk
└─mpatha                       252:9    0   7.7T  0 mpath
  ├─slow_vg-vm--115--disk--1   252:11   0    90G  0 lvm
  ├─slow_vg-vm--115--disk--0   252:12   0     4M  0 lvm
  ├─slow_vg-vm--112--disk--1   252:13   0    80G  0 lvm
  ├─slow_vg-vm--106--disk--1   252:14   0   150G  0 lvm
  ├─slow_vg-vm--106--disk--0   252:15   0     4M  0 lvm
  ├─slow_vg-vm--112--disk--0   252:16   0     4M  0 lvm
  ├─slow_vg-vm--107--disk--1   252:17   0    60G  0 lvm
  ├─slow_vg-vm--107--disk--0   252:18   0     4M  0 lvm
  ├─slow_vg-vm--116--disk--1   252:19   0    60G  0 lvm
  ├─slow_vg-vm--116--disk--0   252:20   0     4M  0 lvm
  ├─slow_vg-vm--118--disk--0   252:21   0    30G  0 lvm
  ├─slow_vg-vm--110--disk--1   252:22   0   120G  0 lvm
  ├─slow_vg-vm--110--disk--2   252:23   0    60G  0 lvm
  ├─slow_vg-vm--110--disk--0   252:24   0     4M  0 lvm
  ├─slow_vg-vm--111--disk--1   252:25   0    60G  0 lvm
  ├─slow_vg-vm--111--disk--0   252:26   0     4M  0 lvm
  ├─slow_vg-vm--117--disk--0   252:27   0    60G  0 lvm
  ├─slow_vg-vm--119--disk--0   252:28   0    45G  0 lvm
  ├─slow_vg-vm--120--disk--1   252:29   0    60G  0 lvm
  └─slow_vg-vm--120--disk--0   252:30   0     4M  0 lvm
sdf                              8:80   0   3.6T  0 disk
└─mpathb                       252:10   0   3.6T  0 mpath
 

Attachments

  • 1755069825019.png
    1755069825019.png
    11.7 KB · Views: 2
Last edited:
Thanks for the reply,

On the SAN all nodes have access to the 2 LUNs, I verified this by running lsblk on all nodes (and seeing the storage)
For multipath config everything seems to be in order:
THe multipath -ll output is the same on all 3 nodes
Code:
# multipath -ll
mpatha (3600c0ff00051b70a7c588d6801000000) dm-5 HPE,MSA 2050 SAN
size=7.7T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 3:0:0:0 sde 8:64 active ready running
mpathb (3600c0ff00051b3b0ae588d6801000000) dm-6 HPE,MSA 2050 SAN
size=3.6T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 3:0:0:1 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:1 sdd 8:48 active ready running
 
Can you post output of pvs, vgs and lvs commands?
I have a suspicion, that lvscan has found lvm signature on /dev/sd* drives and lvm is trying to use them directly (i.e. not over dev-mapper/multipath).
 
Last edited:
Seems like we're looking in the right direction, i'm not sure how we proceed now.
The VMs are running but aren't actually (correctly) on the LUN?
Is it possible to remove the lvm signature on the drives and "migrate" the VMs to /dev/mapper/...?

Code:
# pvs
  PV                 VG      Fmt  Attr PSize    PFree
  /dev/mapper/mpathb fast_vg lvm2 a--    <3.64t <3.64t
  /dev/sda3          pve     lvm2 a--  <557.88g 16.00g


Code:
# vgs
  VG      #PV #LV #SN Attr   VSize    VFree
  fast_vg   1   0   0 wz--n-   <3.64t <3.64t
  pve       1   8   0 wz--n- <557.88g 16.00g


Code:
# lvs
  LV               VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data             pve twi-aotz-- 429.12g             11.52  0.69                          
  root             pve -wi-ao----  96.00g                                                  
  swap             pve -wi-ao----   8.00g                                                  
  vm-100-cloudinit pve Vwi-a-tz--   4.00m data        10.94                                
  vm-100-disk-0    pve Vwi-a-tz--   4.00m data        14.06                                
  vm-100-disk-1    pve Vwi-a-tz-- 100.00g data        4.20                                  
  vm-121-disk-0    pve Vwi-a-tz--   4.00m data        14.06                                
  vm-121-disk-1    pve Vwi-a-tz--  90.00g data        50.24

-- and for esxi3 it's all empty but I assume that this is the expected behavoir since it was only done on esxi2 and then shared to the rest?

Code:
# pvs
  PV         VG  Fmt  Attr PSize    PFree
  /dev/sda3  pve lvm2 a--  <557.88g 16.00g


Code:
# vgs
  VG  #PV #LV #SN Attr   VSize    VFree
  pve   1   3   0 wz--n- <557.88g 16.00g

Code:
# lvs
  LV   VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data pve twi-a-tz-- 429.12g             0.00   0.40                          
  root pve -wi-ao----  96.00g                                                  
  swap pve -wi-ao----   8.00g
 
Last edited: