Q: LVM_Thin and restore 'bigger' VM

fortechitsolutions · Nov 24, 2021

Hi, I have question / curious if anyone has insight/comment.

- doing a P2V migration of a physical host > proxmox; using a temporary staging host to facilitate the transfer.
- Stage host and clobber old host - was done with ISO / stock install proxmox host 7.latest
- OldHost is supermicro hardware / LSI hardware raid controller
- For this one - the ISO proxmox appliance style install was done / carved up main disk into (mostly) an LVM_thin for VM storage by default

-- thestaging server, was a small NUC with ZFS Mirror proxmox install (stock proxmox ISO installer) (no hw raid)
- did a P2V migration of windows server off the physical server, by generating a server backup > converted to VHDX > imported to Qemu / proxmox on stage server
- the P2V stuff went smoothly and the VM booted fine, ran fine etc.
- detail to note, is the source physical host had ~2.6Tb C: drive for windows and about 90Gb of actual disk-used footprint inside windows.
- and when we did the VM image transfer, it had a footprint of around 130gb.

After a few days of patient waiting, the old-original windows host was clobbered to be proxmox. And then migrated the VM off the 'stage server' to the 're-christened supermicro'

-- setup a temporary local USB storage on StageServer as a 'backup storage target' for vanilla proxmox backup-dump
-- powered down the windows VM, and ran a normal backup.
-- this spits out a VM Backup file, approx 130gig footprint.
-- copied this over to the Supermicro, then did a standard CLI restore the VM to this host / designated the LocalLVMThin as the local storage to use.
-- it moved along fine and did bulk of work in ~45min.
-- the weird part, at the start it warned that LVM_thin had no measure to protect against disk filling. And at end once it had 'finished' the restore it then got stuck on "LVM Rescan".

Warn message text:

Code:

  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.

-- I was under impression that LVM Thin would tolerate a VM Disk that exceeded the size of physical storage available (which is the case here - we have about 2Tb of storage available in the LVM_THIN and the VM in question has an official disk size of ~2.6Tb / but only contains ~100gb of actual footprint.

What appears to have happened.
-- LVM was pissed off, and letting it sit overnight did not help.
-- in dmesg we had lots of stuff logged like this:

Code:

[14585.042747] systemd-journald[653]: Failed to open system journal: No space left on device  

[95748.704550] perf: interrupt took too long (2506 > 2500), lowering kernel.perf event_max_sample_rate to 79750

[96598.611255] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.

In this state if I tried to issue commands 'lvs' or 'vgs' it would just timeout / hang.
Hit Ctrl-C and it would tell me "cannot get lock".

Anyhow. I rebooted the box this morning. ... and after that, was able to get output from vgs and lvs.

Code:

root@proxmox:/etc/pve/qemu-server# vgs
  VG  #PV #LV #SN Attr   VSize  VFree
  pve   1   5   0 wz--n- <1.31t <16.00g
 
  root@proxmox:/etc/pve/qemu-server# lvs
  LV            VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <1.17t             21.96  1.25
  root          pve -wi-ao---- 96.00g
  swap          pve -wi-ao----  8.00g
  vm-100-disk-0 pve Vwi-a-tz--  8.00g data        0.00
  vm-102-disk-0 pve Vwi-a-tz--  2.18t data        11.75

The other added fun. Is that there was a tiny VM (VEID100) (2gb) I tried to restore also in parallel. It was unable to do anything while the LVM was 'hung' for the large VM (VEID 102) restore. After the reboot, the VM-100 is super weird.

- cannot delete it, it is shown in GUI as 'locked'
- cannot unlock it in CLI, it tells me the CONF file does not exist
- cannot create/touch the conf file - tells me it exists, cannot create
- cannot see or remove conf file that supposedly exists
- I was able to delete the lv for vm-100 but that did not free up that mystery mess piece either

some code bits relevant for this:

Code:

root@proxmox:~# qm unlock 100
Configuration file 'nodes/proxmox/qemu-server/100.conf' does not exist

root@proxmox:~# cd /etc/pve/qemu-server
root@proxmox:/etc/pve/qemu-server# touch 100.conf

touch: cannot touch '100.conf': File exists

root@proxmox:/etc/pve/qemu-server# ls -la
total 1
drwxr-xr-x 2 root www-data   0 Nov 22 13:34 .
drwxr-xr-x 2 root www-data   0 Nov 22 13:34 ..
-rw-r----- 1 root www-data 429 Nov 24 07:16 102.conf

root@proxmox:/etc/pve/qemu-server# rm 100.conf

rm: cannot remove '100.conf': No such file or directory

root@proxmox:/etc/pve/qemu-server# qm destroy 100
Configuration file 'nodes/proxmox/qemu-server/100.conf' does not exist

root@proxmox:/etc/pve/qemu-server#

So, I am not sure if anyone has seen anything like this. I did reboot the box again, and that fails to 'release' anything with this locked-but-absent-but-dead-and-locked VM100. Worst case I just blow away the whole box and clean install, but that seems kind of lame-sad as a solution.

Regarding the LVM.

- I am curious if LVM_thin is 'meant' to behave this way and I just don't understand how LVM Thin works (ie, it cannot tolerate a VM with disk size that exceeds LVM capacity?) (or is there some other weird-problem-behaviour here, which is a 'real and unexpected problem' ?)

- I am guessing the 'cleanest' solution is something like
(1) shrink disk inside windows VM to be smaller/sane size, maybe 500gb
(2) do some work to shrink the proxmox disk allocation for this guest. I haven't done this before on a ZFS-backed VM so will have a bit of learning here.
(3) once the VM has a smaller/sane disk allocation. Do a backup-dump; then transfer and restore onto other host will be fine-sane and we have no issues with LVM_Thin or space overcommit.

Big picture I am maybe a stick in the mud, I kind of like the good old days when /var/lib/vz proxmox storage was a local filesystem (even if sitting on top of LVM) and not an actual LVM_Thin volume / which you can't 'get and see inside' via filesystem and manipulate directly the qcow/raw files etc that are just sitting there. Possibly I will just re-do my install and allocate tiny bit of space for LVM_THIN and then carve off a vanilla local directory to become VM storage on top of filesystem. Since I know I can do this sort of "thin overcommit disk" with a RAW or QCOW file with zero drama. And this LVM layer here just seems to be giving me multiple headaches and no big benefits.

yay.

anyhow. if anyone has any feedback or comments, it would be appreciated.

Tim

fortechitsolutions · Nov 25, 2021

Footnote, another thread I found this morning had the answer to my 'mystery' about the VM I could not unlock. Sigh. Doh.

Code:

You are using qm instead of pct, you should use pct unlock 110 instead.

so at least that one is clear to me now.

If anyone has comments on the LVM_THIN topic, any feedback is appreciated.

Thanks,
Tim

fortechitsolutions · Nov 25, 2021

Footnote2, I reviewed this thread, https://forum.proxmox.com/threads/proxmox-io-error-lvm-thin-full.64714/
and I think the situation is more or less the same, and I think if I am reading this correctly, LVM_Thin does not work the way I thought.
but any feedback confirm is appreciated.

Search

Search

Q: LVM_Thin and restore 'bigger' VM

fortechitsolutions

Renowned Member

fortechitsolutions

Renowned Member

fortechitsolutions

Renowned Member