Moving container on LVM thin volume to smaller storage

wbk

Renowned Member
Oct 27, 2019
227
31
68
Hi all,

I have a container on a thinly provisioned LVM volume. I want to extract the disk that backs the thin volume out of the system running Proxmox, but run into the issue that thin volumes don't support shrinking.

After moving stuff out of the way, there is a lot of air in the container, but the thin volume is, of course, still the original size. Some (approximate) numbers :
  • PV / VG backing the thin pool : 7,3 TiB
    • Thin pool : 4,5 TiB
    • The containers thin volume : 3,6 TiB
  • Container :
    • Size of the file system : 3,6 TiB
    • Actual storage used: 0,5 TiB
  • Available target for moving : 3,5 TiB disk
It's quite close, but the target disk is a few tens of GB short.

I ran a short test to see whether Proxmox refuses to move the container to a target that is too small. It will diligently start moving.

What are my options, while also minimizing downtime?

  • I have backups, but I suspect restoring such a backup will run into the same problem eventually as moving the volume (to be tested, after I have a backup of the new smaller size)
  • The container used about 3,3 TiB before moving data elsewhere. The last 0,3 TiB are, I presume, never used or claimed by LVM. I would not lose extends that got data if the container got cut off at 3,5 TiB, but LVM will probably stress over losing integrity
  • File level move to a new container may work
    • create a new 0,7 TiB container on target storage
    • shutdown both containers
    • mount the filesystems on the host
    • rsync from the existing container to the new one
    • comes with downtime
  • I don't really have an option at the moment to put larger disks in the system, nor is the container expected to grow to its old size anymore

My plan is to:
  • shutdown the container, resize2fs to shrink the filesystem
  • create a backup of the container with the large amount of unused space
  • restore to the smaller HDD on a regular logical volume
  • in case the restore does not fail, continue
  • fsck.ext4 the file system (it's probably confused about the missing tail of the filesystem)
  • check whether the container boots, shutdown
  • resize2fs the file system (shrink to some 0,7 TiB)
  • lvresize -r the volume to some 0,7 TiB
  • see whether the container still boots, shutdown
  • rsync the delta from the running original running container since creating the backup
  • shutdown both containers, rsync once more
  • boot the new container
Can be said upfront that this is not going to work? In case it might work, which part is highest risk? Suggestions or improvements?
 
Last edited:
Update, unexpected failure at step 1,

My plan is to:
  • shutdown the container, resize2fs to shrink the filesystem

While logged in to the host, with the container shut down, I resized the FS in the LV to 600 GB:

Bash:
# fsck.ext4 /dev/mapper/allerlei-vm--104--disk--0
e2fsck 1.47.0 (5-Feb-2023)
/dev/mapper/allerlei-vm--104--disk--0: clean, 5198085/244908032 files, 113790776/979632128 blocks
# resize2fs -p -z 20260311_resize2fs-device.e2undo /dev/mapper/allerlei-vm--104--disk--0 600G
resize2fs 1.47.0 (5-Feb-2023)
Overwriting existing filesystem; this can be undone using the command:
    e2undo 20260311_resize2fs-device.e2undo /dev/mapper/allerlei-vm--104--disk--0

Please run 'e2fsck -f /dev/mapper/allerlei-vm--104--disk--0' first.
# e2fsck -f /dev/mapper/allerlei-vm--104--disk--0
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Inode 46012422 extent tree (at level 2) could be narrower.  Optimize<y>? no
Inode 46039626 extent tree (at level 2) could be narrower.  Optimize<y>? no
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

Booting the container afterwards an checking the size of the filesystem, I was in for a surprise:

Code:
# # (in the container)
# df -h
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/allerlei-vm--104--disk--0  3.6T  374G  3.1T  11% /

It still is 3.6 T , with only 374 G used. I had expected the size of the of the filesystem to be 600 G.

I did not know what to expect as values for the LV, but it has not changed; the current value data% for `lvs` is the same as it was before resizing the filesystem

Code:
# lvs
  LV            VG       Attr       LSize  Pool      Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vm-104-disk-0 allerlei Vwi-aot--- <3.65t dunnedata        79.30
 
The following step was unexpectedly successful,

  • create a backup of the container with the large amount of unused space
I had expected to wait a day for 3 TB to be backed up, but to my surprise the backup was already finished. It had actually only processed data for the size of the file system, not for the size of the partition. That's a bonus in my eyes ;-)

Step after that,

  • restore to the smaller HDD on a regular logical volume

not so much:

Code:
Recovering backed-up configuration from 'mt_pbs_localbaks_online:backup/ct/104/2026-03-11T17:09:27Z'
TASK ERROR: unable to restore CT 101 - lvcreate 'online/vm-101-disk-0' error:   Volume group "online" has insufficient free space (953861 extents): 956672 required.
Not _totally_ to my surprise, the backup still expects an LV with the correct size.

I'll try the `rsync` route.

Thanks for reading. As before: I'm open to thoughts and suggestions!
 
I didn't read all of it but you can shrink thin volumes, just not the thin pool itself. Here's some outputs from when I demonstrated this to someone
Bash:
# lvresize -r -L 8G pve/vm-101-disk-0
  File system ext4 found on pve/vm-101-disk-0.
  File system size (20.00 GiB) is larger than the requested size (8.00 GiB).
  File system reduce is required using resize2fs.
  File system fsck will be run before reduce.
  Reducing file system ext4 to 8.00 GiB (8589934592 bytes) on pve/vm-101-disk-0...
e2fsck /dev/pve/vm-101-disk-0
/dev/pve/vm-101-disk-0: Inode 1688 extent tree (at level 1) could be shorter.  IGNORED.
/dev/pve/vm-101-disk-0: Inode 15990 extent tree (at level 1) could be shorter.  IGNORED.
/dev/pve/vm-101-disk-0: 22805/1310720 files (0.3% non-contiguous), 297383/5242880 blocks
e2fsck done
resize2fs /dev/pve/vm-101-disk-0 8388608k
resize2fs 1.47.2 (1-Jan-2025)
Resizing the filesystem on /dev/pve/vm-101-disk-0 to 2097152 (4k) blocks.
The filesystem on /dev/pve/vm-101-disk-0 is now 2097152 (4k) blocks long.

resize2fs done
  Reduced file system ext4 on pve/vm-101-disk-0.
  Size of logical volume pve/vm-101-disk-0 changed from 20.00 GiB (5120 extents) to 8.00 GiB (2048 extents).
  Logical volume pve/vm-101-disk-0 successfully resized.

# pct fsck 101
fsck from util-linux 2.41
/dev/mapper/pve-vm--101--disk--0: clean, 22805/524288 files, 245989/2097152 blocks
Before doing this make a backup, run pct fstrim CTID and pct fsck CTID, then shut the CT down. After resizing fsck again and issue a pct rescan.
This depends a lot on the storage so the recommended/universal way is to restore a backup while setting the size.
I also described how to do this with ZFS here.
 
Last edited:
  • Like
Reactions: wbk
Hi Impact,

Great! Thank you for mentioning, I was under the impression (as you probably guessed) that the LV itself was grow-only as well.

I'm at the moment not able to test it right away, but I'll post feedback after I tried!
 
Shortly after I had the opportunity to log in and kick off the fstrim, than realized it would be handy to chain fsck:

Code:
# pct fstrim 104
^C
# pct fstrim 104 && pct fsck 104
CT is locked (fstrim)

It's nine hours later now. I realize that fsck wouldn't have worked out on the running container anyway, but I'd expected the container to be unlocked by now.

Belatedly, from
Code:
man pct/code] for fstrim
[QUOTE]
pct fstrim <vmid> [OPTIONS]

Run fstrim on a chosen CT and its mountpoints, except bind or read-only mountpoints.
[/QUOTE]

Apart from the root volume, there are two NFS mounts. Supposedly root on the host does not have anything to say about the FS underlying these mountpoints on another server (they're not on SSD or thinly provisioned volumes, so they wouldn't have an effect either way). 

Reading the man description though, it may have got stuck on that: they're mountpoints, not bind- or read-only mountpoints. 

There's not any visible disk activity. With SSDs I would be less surprised, as the command could have been kicked down the line until it is executed by the SSD itself. On my thin pool, it's all in the open and involves (I suppose) direct disk actions with visible I/O and at least some CPU impact. 

There was an ever so slight rise in server load around the time I executed the command. As a check, I temporarily mounted the container volume directly on the host, and am running [code]fstrim /mnt/container
on that directly.
That sees a similar but lower and shorter moderate increase in load. The process is ready within a minute. It leads me to conclude that the lock on the container is a stale lock, perhaps caused by me starting the trim twice and loading an fsck task on top.

This depends a lot on the storage so the recommended/universal way is to restore a backup while setting the size.
I'm not parsing this sentence, could you elaborate? Nice set of hints and tips by the way!
 
Last edited: