No storage options on node 1 - pvesm hangs

Flavio Moringa · Aug 14, 2018

Hello,

I had a single proxmox 5.2 node and all was fine... I created some machines... perfect...

Then I added a second node, enabled cluster mode, and all seems fine also... Both nodes seem available on the web interface, and I created more machines on node 2...

Now I was going to create a new machine on node 1 and I discovered that I cannot create a new VM because when i get to the storage screen I get no storage name options.... In node 2 I get the NFS dirs, and the local storages.... in node 1 it's empty... Although the currently running VM's are ok....

In cli on node 2 doing a "pvesm status" is immediate... in node 1 it takes a really long time... but it eventually shows the storages...

I did a strace and the problem seems to be when trying to read a LVM local storage that I have:

Code:

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [7], left {tv_sec=0, tv_usec=964320})

read(7, "  ssd:959115689984:465194450944\n", 4096) = 32

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [7], left {tv_sec=0, tv_usec=997773})

read(7, "  PGData:214744170496:0\n", 4096) = 24

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

It seems to be having some issues reading the LVM info... But I really don't know what to do...

It keeps repeating that "select(16, [7 9], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)" for a long time until it finally gives the results... but in webgui I just don't see them....

Can someone please help me? Any info that you need please just ask.

Thanks

dcsapak · Aug 16, 2018

what does

Code:

vgs
lvs

show?

Flavio Moringa · Aug 16, 2018

Hi,

first thanks for your help.

vgs:

Code:

root@pve01:~# time vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  PGData   1   1   0 wz--n- 200.00g      0
  PGData   1   1   0 wz--n-  32.00g      0
  pve      1   3   0 wz--n- 930.75g  16.00g
  ssd      1  11   0 wz--n- 893.25g 401.25g

real    3m21.504s
user    3m20.902s
sys    0m0.585s

vgs takes a real longtime (more than 3 minutes)... but in the end it gives the correct results...
Maybe it has something to do with that "PGData" volume being repeated?!

lvs:

Code:

root@pve01:~# time lvs
  LV            VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  main          PGData -wi------- 200.00g                                                  
  main          PGData -wi-------  32.00g                                                  
  data          pve    twi-aotz-- 794.53g             0.00   0.04                          
  root          pve    -wi-ao----  96.00g                                                  
  swap          pve    -wi-ao----   8.00g                                                  
  vm-100-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-101-disk-1 ssd    -wi-ao---- 100.00g                                                  
  vm-102-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-103-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-104-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-105-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-106-disk-1 ssd    -wi-ao----   8.00g                                                  
  vm-106-disk-2 ssd    -wi-ao----  32.00g                                                  
  vm-107-disk-1 ssd    -wi-ao----  20.00g                                                  
  vm-107-disk-2 ssd    -wi-ao---- 200.00g                                                  
  vm-113-disk-1 ssd    -wi-ao----  32.00g                                                  

real    0m0.027s
user    0m0.000s
sys    0m0.014s

lvs executes superfast and all seems well...

Doing a lvdisplay I do have 2 logical volumes with the same path:

Code:

 --- Logical volume ---
  LV Path                /dev/PGData/main
  LV Name                main
  VG Name                PGData
  LV UUID                JHAXwg-SZpR-eAyp-UG1Y-Ql62-dLDV-0H0uby
  LV Write Access        read/write
  LV Creation host, time oauthdb03, 2018-07-18 17:38:14 +0100
  LV Status              NOT available
  LV Size                200.00 GiB
  Current LE             51199
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
   
  --- Logical volume ---
  LV Path                /dev/PGData/main
  LV Name                main
  VG Name                PGData
  LV UUID                5wKDj0-H73E-USJG-BEiY-Tc81-aQUd-Emwdrl
  LV Write Access        read/write
  LV Creation host, time ticketsdb, 2018-07-17 16:45:23 +0100
  LV Status              NOT available
  LV Size                32.00 GiB
  Current LE             8191
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

Maybe a node reboot would fix this issue, but I have a real important database running on this node, and I cannot have downtime on it...

dcsapak · Aug 16, 2018

Flavio Moringa said:
Maybe it has something to do with that "PGData" volume being repeated?!

yes this is a problem, lvm does not like multiple vg having the same name

Flavio Moringa · Aug 16, 2018

Ok...

Now I searched for that issue and got this thread:
https://forum.proxmox.com/threads/lvm-volume-groups-with-same-name.45658/

The issue probably arose because we copied a qemu backup image from an older proxmox (3) and did an image qemu restore on this new proxmox v5... and on 2 VM's, they had internally multiple disks with LVM so that the VM could have it's partitions easily extended...

Can I change the volume group name without downtime?

Flavio Moringa · Aug 16, 2018

I have this:

Code:

root@pve01:~# pvs
  PV                     VG     Fmt  Attr PSize   PFree 
  /dev/sda3              pve    lvm2 a--  930.75g  16.00g
  /dev/sdb               ssd    lvm2 a--  893.25g 401.25g
  /dev/ssd/vm-106-disk-2 PGData lvm2 a--   32.00g      0 
  /dev/ssd/vm-107-disk-2 PGData lvm2 a--  200.00g      0

Doing a vgdisplay, for the 2 entries, after a few minutes, I get:

Code:

 --- Volume group ---
  VG Name               PGData
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               200.00 GiB
  PE Size               4.00 MiB
  Total PE              51199
  Alloc PE / Size       51199 / 200.00 GiB
  Free  PE / Size       0 / 0   
  VG UUID               cKWEg0-gFFO-MHCF-JnRY-RVwZ-Y1mJ-PEdEtN
   
  --- Volume group ---
  VG Name               PGData
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               32.00 GiB
  PE Size               4.00 MiB
  Total PE              8191
  Alloc PE / Size       8191 / 32.00 GiB
  Free  PE / Size       0 / 0   
  VG UUID               tKzpXG-a8Fr-vL0h-6G2d-YQpA-GSsQ-N1ITQM

So can I do a:

Code:

vgrename tKzpXG-a8Fr-vL0h-6G2d-YQpA-GSsQ-N1ITQM PGDATATickets

And proxmox, and the guests will be ok, and no downtime will happen? This change won't affect anything? Proxmox won't need something eles updated somewhere eles?

Sorry if I'm being annoying, but it's a production system and I want to make sure not to screw up anything....

dcsapak · Aug 16, 2018

as the vgs are inside your vm, you would need to rename them there
also i would advise you to exclude your 'ssd' vg from being scanned from lvm (see the other thread where i explain this)

Flavio Moringa · Aug 16, 2018

Thank you very much Dominik,

The rename inside the guest worked like a charm...

I will also add the ssd vg from the host lvm scan list as you indicated.

Search

Search

No storage options on node 1 - pvesm hangs

Flavio Moringa

New Member

dcsapak

Proxmox Staff Member

Flavio Moringa

New Member

dcsapak

Proxmox Staff Member

Flavio Moringa

New Member

Flavio Moringa

New Member

dcsapak

Proxmox Staff Member

Flavio Moringa

New Member