LVM Failed to update pool pve/data.

chrigiboy

Well-Known Member
Nov 6, 2018
93
1
48
Hello,

A few weeks ago we installed and configured a new Proxmox node for our cluster.
The new Server has a lvm-thin pool called "local-lvm" same as on every other node in our cluster.
After the server was added we started transferring a few existing VM's to the new node.
After the first few live migrations at some point lvm got stuck printing "error: Failed to update pool pve/data." for every disk involved change.
We were able to move the VM's back to the original node and reinstalled this node (and increased the "data_tmeta" this time hoping this would help).
However after transferring a few test VM's it happend again.

The LVM-Thin is running on a RAID Controller "HighPoint SSD7505".

Code:
~# uname -a
Linux px 5.11.22-3-pve #2 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux

Code:
~# lvs -a
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data            pve twi-cotzM-  <7.25t             20.39  2.13                           
  [data_tdata]    pve Twi-ao----  <7.25t                                                   
  [data_tmeta]    pve ewi-ao----  10.00g                                                   
  [lvol0_pmspare] pve ewi-------  10.00g                                                   
  vm-159-disk-0   pve Vwi-a-tz--   1.46t data        100.00                                 
  vm-363-disk-0   pve Vwi---tz-- 500.00g data                                               
  vm-363-disk-1   pve Vwi-a-tz-- 400.00g data        1.95                                   
  vm-364-disk-0   pve Vwi---tz-- 500.00g data                                               
  vm-364-disk-1   pve Vwi-a-tz-- 400.00g data        1.07                                   
  vm-365-disk-1   pve Vwi-a-tz-- 400.00g data        0.34

Code:
~# lvdisplay
  --- Logical volume ---
  LV Name                data
  VG Name                pve
  LV UUID                RJLpvw-O0MM-O3kP-zh8z-zVCb-UH6F-IXnf7t
  LV Write Access        read/write (activated read only)
  LV Creation host, time px, 2022-02-17 16:13:28 +0100
  LV Pool metadata       data_tmeta
  LV Pool data           data_tdata
  LV Status              available
  # open                 0
  LV Size                <7.25 TiB
  Allocated pool data    20.39%
  Allocated metadata     2.13%
  Current LE             1899999
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:3
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-159-disk-0
  LV Name                vm-159-disk-0
  VG Name                pve
  LV UUID                3vpJrP-2Z8y-2BtB-pxLV-1bPp-iPsT-thyDWe
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 09:13:18 +0100
  LV Pool name           data
  LV Status              available
  # open                 0
  LV Size                1.46 TiB
  Mapped size            100.00%
  Current LE             384000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:4
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-363-disk-0
  LV Name                vm-363-disk-0
  VG Name                pve
  LV UUID                MhTssx-hRxM-1pPh-ABxW-bojg-fBdo-1gjG3X
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 10:30:31 +0100
  LV Pool name           data
  LV Status              NOT available
  LV Size                500.00 GiB
  Current LE             128000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-363-disk-1
  LV Name                vm-363-disk-1
  VG Name                pve
  LV UUID                y21xUL-MaJR-ltbH-zPzw-5Q9x-uIpr-2QI0n9
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 10:31:06 +0100
  LV Pool name           data
  LV Status              available
  # open                 0
  LV Size                400.00 GiB
  Mapped size            1.95%
  Current LE             102400
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:6
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-364-disk-0
  LV Name                vm-364-disk-0
  VG Name                pve
  LV UUID                5LSxWN-tsEU-4fMD-QcID-hVit-FDf2-egEe4g
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 10:31:58 +0100
  LV Pool name           data
  LV Status              NOT available
  LV Size                500.00 GiB
  Current LE             128000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-364-disk-1
  LV Name                vm-364-disk-1
  VG Name                pve
  LV UUID                aBeqia-g9GE-aBc3-B6Bi-rtt8-5iaN-4nVMWt
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 10:32:04 +0100
  LV Pool name           data
  LV Status              available
  # open                 0
  LV Size                400.00 GiB
  Mapped size            1.07%
  Current LE             102400
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:8
  
  --- Logical volume ---
  LV Path                /dev/pve/vm-365-disk-1
  LV Name                vm-365-disk-1
  VG Name                pve
  LV UUID                c7do2B-mj4Z-8qAB-JD0D-oryv-HkmS-oyvzy9
  LV Write Access        read/write
  LV Creation host, time px, 2022-02-18 10:40:59 +0100
  LV Pool name           data
  LV Status              available
  # open                 0
  LV Size                400.00 GiB
  Mapped size            0.34%
  Current LE             102400
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:9

Code:
~# vgdisplay
  --- Volume group ---
  VG Name               pve
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  19
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                7
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <7.27 TiB
  PE Size               4.00 MiB
  Total PE              1905119
  Alloc PE / Size       1905119 / <7.27 TiB
  Free  PE / Size       0 / 0   
  VG UUID               L7Amtm-FkcZ-e2cq-mIw3-Tl0r-JeJZ-PMyOiv
 
Code:
  data            pve twi-cotzM-  <7.25t             20.39  2.13

the attributes indicate that this thin pool needs checking ('c') and that its metadata is read-only ('M') so no writes/changes are possible.

I'd suggest the following:
- !!ensure you have backups!!
- stop all guests using the storage
- disable the storage
- deactivate all the thin volumes (with 'lvchange')
- run thin_check on metadata volume and post output here

since it happened twice on just this node, with a reinstall in-between, I'd also check memory and the disk(s) and controller for hardware issues.
 
Hi,

Sorry to revive an old throat but the same thing has happened to me.

Some VMs got corrupted because the lvm switched to read only suddenly.

pvesm status showed activating LV 'pve/data' failed and manual repair required.

Fixed it using (in the shell while Proxmox was running):

lvconvert --repair pve/data

Then it said something like truncating metadata device and a warning that there is a backup of unrepaired metadata. And to lvremove to remove it.

However even after booting, cannot write to the lvm thin.

Running lvs -a shows the same attributes as the original poster:

data pve twi-cotzM-

How can we fix it?
 
did you take raw backups of the disks before you ran those destructive commands? I hope you do have backups of your guests at least..
 
did you take raw backups of the disks before you ran those destructive commands? I hope you do have backups of your guests at least..
Yes I have all backups and restored those backups and running on a new LVM thin at a new disk. But no raw backups I think. I am ready to delete the pve/data LVM and recreate the LVM if required. How do I do so? I don't want my existing VM/CT/Guest configs to be affected, but I can stop the guests if required
 
Last edited:
the thin pool should be removable using lvremove
 
If something similar happens in the future, how would I go about fixing it without destroying the underlying lvm?

I know I have to do the following:

- disable the storage
- deactivate all the thin volumes (with 'lvchange')
- run thin_check on metadata volume and post output here

But could you guide my by telling me which exact commands to run?
 
well first you'd need to find out what the cause is..

there is a common pattern of exhausting metadata space and breaking a thin pool that way, but it could also be memory or disk failure, or bugs in LVM..
 
I am sure it was not due to exhausting metadata disk space because even after repairing it showed low metadata usage on this pve/data LVM thin.

The SSD is a brand new 2TB NVME Samsung SSD (less than a year old) and all smart data shows its still OK. Its still running the Proxmox host install without any problems, errors or freezing.

What happened may be a bug but I'm not sure. I had a 500GB disk (only 60GB used) inside a Windows 11 Pro VM. This disk was stored on pve/data LVM thin. I resized this disk to 1000GB (+500GB) increment and checked in Windows and no problem.

I made the disk offline in guest Windows disk manager. Then in proxmox I simply detached the disk. As soon as I clicked detach in proxmox, almost all other guests stored in the disk started reporting I/O errors or freezing.

I tried shutting down all guests and most succeeded but this Windows guest did not (stuck on shutting down spinning circle) so I forced turn off this Windows guest in Proxmox.

Then I restarted Proxmox through the GUI, it turned back on normally but the pve/data LVM think showed the grey question mark/unavailable and none of the guests stored in this LVM would turn on.

I should also say that less than 50% of the NVME was in acutal use. So, do you have any guess as to what had happened?
 
you haven't shown any actual pvs or lvs output or errors so far, so I can't really say anything else than what I said above..
 
Hi fabian and I appreciate you taking the time to help.

I have attached all the relevant output and errors. Any information would be appreciated.
 

Attachments

  • Pasted image 20240829172346.png
    Pasted image 20240829172346.png
    118.4 KB · Views: 5
  • Pasted image 20240829172853.png
    Pasted image 20240829172853.png
    26.1 KB · Views: 5
  • Pasted image 20240829175310.png
    Pasted image 20240829175310.png
    18.4 KB · Views: 5
  • Pasted image 20240829175348.png
    Pasted image 20240829175348.png
    18 KB · Views: 5
  • Pasted image 20240829175530.png
    Pasted image 20240829175530.png
    139.6 KB · Views: 5
the lvs output would indicate that you still have plenty of volumes on the thin pool (even though the pool is read-only atm) - I would advise saving those before destroying (and potentially recreating) the thin pool. is there anything LVM related in the journal on bootup?
 
Is there any chance to repair the lvm thin pool? Which commands do I need to run? The volumes are detached from the VM/Guests but cannot be removed from the GUI because the thin pool is read only
 
thin_check and thin_repair would be the next steps, but for that the thin pool cannot be active.

edit: there also seems to be an updated/rewritten version of those tools, if the packaged ones fail I'd give those a shot:

https://github.com/jthornber/thin-provisioning-tools
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!