Proxmox Thin-Pool Full but reported only half full?

Jospeh Huber · Nov 25, 2020

Hi,

today one of my VMs (backed on LVM thin and backed on a RAID) ran out of disk space. The only vm without disk monitoring :-((
It seems that this has destroyed my other vms in my thin pool. Several I/O errors have occured.

My LVM thin pool is not overbooked, total vm space is 871 Gb and space on device is 892Gb.
When I look int the ProxMox GUI of the storage the used VM space is only 501 Gb (56%) of the 892 Gb.

In my syslog I have found a " switching pool to out-of-data-space" and this was the beginning ot the end...

Code:

Nov 25 05:38:37 wpx3 kernel: [85437823.532713] device-mapper: thin: 251:3: switching pool to out-of-data-space (queue IO) mode
Nov 25 05:38:37 wpx3 lvm[1052]: Thin ssd-ssd-tpool is now 100% full.

After that several I/O errors on the other VMs have occured.

Code:

Nov 25 05:39:37 wpx3 kernel: [85437883.566354] Buffer I/O error on device dm-12, logical block 23020159
Nov 25 05:39:37 wpx3 kernel: [85437883.566614] Buffer I/O error on device dm-6, logical block 22077699
Nov 25 05:39:37 wpx3 kernel: [85437883.567171] EXT4-fs warning (device dm-6): ext4_end_bio:330: I/O error -28 writing to inode 6947209 (offset 343932928 size 8388608 starting block 22078720)
Nov 25 05:39:40 wpx3 kernel: [85437887.228228] JBD2: Detected IO errors while flushing file data on dm-6-8
Nov 25 05:39:41 wpx3 kernel: [85437888.145104] JBD2: Detected IO errors while flushing file data on dm-6-8
...
Nov 25 06:24:03 wpx3 kernel: [85440550.222865] dm-9: rw=0, want=35262332145712, limit=16777216
Nov 25 06:24:03 wpx3 kernel: [85440550.226850] dm-9: rw=0, want=35262332145712, limit=16777216
Nov 25 06:24:03 wpx3 kernel: [85440550.230841] attempt to access beyond end of device

The error reports "ssd-ssd-tpool (251:3)" that means the thin pool itself is full, right?

Code:

root@wpx3:/var/log# dmsetup ls --tree
ssd-vm--111--disk--1 (251:12)
 └─ssd-ssd-tpool (251:3)
    ├─ssd-ssd_tdata (251:2)
    │  └─ (8:17)
    └─ssd-ssd_tmeta (251:1)
       └─ (8:17)
ssd-vm--126--disk--1 (251:9)
 └─ssd-ssd-tpool (251:3)
    ├─ssd-ssd_tdata (251:2)
    │  └─ (8:17)
    └─ssd-ssd_tmeta (251:1)
       └─ (8:17)
...

Any ideas? I do not know what to do here ...

Luckily i didn't lose any data, because the VMs are all part of clusters and get the current state when they are starting up again.

Thx for your help.

dcsapak · Nov 25, 2020

can you post the output of lvs, pvs, and pvesm status ?

Jospeh Huber · Nov 25, 2020

Here it is:

Code:

 lvs
  LV                              VG     Attr       LSize   Pool Origin        Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0                           backup -wi-ao----   2.30t                                               
  data                            pve    twi-aotz--   1.00t                    0.00   0.42                 
  root                            pve    -wi-ao----  96.00g                                               
  swap                            pve    -wi-ao---- 251.00g                                               
  snap_vm-123-disk-1_Before_HTTP2 ssd    Vri---tz-k  15.00g ssd  vm-123-disk-1                             
  ssd                             ssd    twi-aotz-- 892.91g                    60.12  29.70               
  vm-102-disk-2                   ssd    Vwi-aotz--   8.00g ssd                95.85                       
  vm-111-disk-1                   ssd    Vwi-aotz-- 380.00g ssd                61.55                       
  vm-115-disk-1                   ssd    Vwi-aotz-- 256.00g ssd                39.72                       
  vm-116-disk-1                   ssd    Vwi-a-tz--   8.00g ssd                9.91                       
  vm-117-disk-2                   ssd    Vwi-aotz--   9.00g ssd                99.85                       
  vm-118-disk-2                   ssd    Vwi-aotz--  10.00g ssd                99.87                       
  vm-119-disk-2                   ssd    Vwi-aotz-- 120.00g ssd                99.71                       
  vm-123-disk-1                   ssd    Vwi-aotz--  20.00g ssd                99.84                       
  vm-126-disk-1                   ssd    Vwi-aotz--  60.00g ssd                36.66

Code:

 pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sda3  pve    lvm2 a--    1.34t 4.00m
  /dev/sda4  backup lvm2 a--    2.30t    0
  /dev/sdb1  ssd    lvm2 a--  893.13g    0

Code:

pvesm status
BACKUP           dir 1      2429387420      1199276004      1106682552 52.51%
BACKUP_WPX1      nfs 1      2429387776       921167872      1384791040 40.45%
BACKUP_WPX2      nfs 1      2429387776       836751360      1469207552 36.79%
BACKUP_WPX3      nfs 1      2429387776      1199276032      1106682880 52.51%
REMOTE_BACKUP    nfs 1      1238411136       497142912       741149440 40.65%
SSD            lvmthin 1       936288256       562896499       373391756 60.62%
local            dir 1        98952796         3222248        90681000 3.93%
local-lvm      lvmthin 1      1073741824               0      1073741824 0.50%

Jospeh Huber · Nov 25, 2020

I think the problem is similar to this post here:
https://forum.proxmox.com/threads/drbd9-lvm-thin-provisioning.28584/#post-143856

My metadataspace ran out of space. The values I have provided above were after the restore of the corrupt VMs and moving one VM away. I think that has healed the problem.

The advice in the post was to set thin_pool_autoextend_threshold to something below 100...
But where... in lvm.conf?

Jospeh Huber · Nov 25, 2020

I think the above is not a good idea, if the physical space is fully used ...

But I found another thing here that sounds good for me:
https://mellowhost.com/billing/inde...8/How-to-Extend-meta-data-of-a-thin-pool.html
When I do a lvs -a I can see that my meta data pool is only 112m.

Code:

...
  ssd                             ssd    twi-aotz-- 892.91g                    60.15  29.71
  [ssd_tdata]                     ssd    Twi-ao---- 892.91g
  [ssd_tmeta]                     ssd    ewi-ao---- 112.00m
 ...

Is it safe to extend it with the following command?

Code:

lvextend -L+128M /dev/ssd/ssd_tmeta

In other words: from were does it take the store, because the ssd_tdata is using all of the physical space?

I have installed a detailed monitoring of the meta data space, so I will see in the future when it overflows...

Search

Search

Proxmox Thin-Pool Full but reported only half full?

Jospeh Huber

Renowned Member

dcsapak

Proxmox Staff Member

Jospeh Huber

Renowned Member

Jospeh Huber

Renowned Member

Jospeh Huber

Renowned Member