LVM over shared iSCSI (MSA 2050) – Deleted VMs do not free up space – Any remediation?

ugo

New Member
Nov 14, 2024
5
0
1

Infrastructure Overview​


I’m running a production cluster based on Proxmox VE 8.x with the following setup:

  • Cluster: 4 × HPE DL380 Gen10 nodes
  • Shared Storage: HPE MSA 2050 configured in Virtual Storage mode
  • A single iSCSI LUN is presented to all nodes using multipath
  • Proxmox storage is configured as LVM (non-thin) on top of this iSCSI LUN, named storage-msa
  • All VM disks are stored on this shared volume

⚠️ Issue​


When I delete a VM from the Proxmox GUI, the allocated space is not reclaimed on the MSA 2050. After investigation, I realized:

  • The LVM volume is not thin-provisioned, so discard/trim is ineffective
  • The iSCSI LUN does not receive any space reclamation commands
  • Over time, this causes the LUN to fill up permanently, even though VMs are deleted

This is starting to become a problem from a capacity planning and scalability perspective.

What I’ve tried / considered​


  • Enabling discard=on in VMs: has no effect (not relevant on LVM thick)
  • lvchange --discard passdown: not supported on non-thin LVs
  • fstrim: ineffective
  • Script to wipe orphaned volumes: works partially, but doesn’t free space on the LUN itself

❓My Question​


Is there any way to reclaim space on a shared iSCSI LUN used with LVM (non-thin), without having to destroy and recreate the volume group?

If not, what would be the best approach going forward to avoid this kind of trap?
  • Should I switch to LVM-thin (not shareable)?
  • Migrate to Ceph or ZFS over iSCSI?
  • Other recommendations?

Thanks a lot for your insights — really looking to align with best practices.
 
Hi There, I am facing exactly same problem, if this is not bug then solution needed !!

---

I tried all, ZFS , LVM or LVM-Thin over iSCSI LUN, with discard option on LVM-Thin, no working,

ZFS didnt work,

LVM-Thin with discard option on LVM-Thin, didnt work,

only when LVM with "Wipe Removed Volumns" on, will reflect to LUN side after long long time wiping.....

dispite

```
pvesm set <LVM> --saferemove_throughput 1048576000
```
can improve speed but still slow if LVM is large !!!

and furthermore, if a VM created more than 1 VM Disks on this LVM,

on VM remove, it create more than 1 wiping task wipe the same LUN the same time, it eventually causing crash on LUN!!!!

.
.
.
33533382656 B 31.2 GB 523.9 s (8:43 min) 64001764 B/s 61.04 MB/s
zero out finished (note: 'No space left on device' is ok here): write: No space left on device
Volume group "pve1vm22x-lp" not found <------------ HERE
TASK ERROR: lvremove 'pve1vm22x-lp/del-vm-22000-cloudinit' error: Cannot process volume group pve1vm22x-lp

HERE: LUN source already disappeared at this point,

i tested , one by one VM Disk removing , still fine, VM-> Remove = wiping all Disks the same time, 100% crash !

---

If this is not bug then solution needed !!

If there is requirement on LUN source discard can pass down, i hope Proxmox specify it, size unequal on 2 side is not acceptable......

Yet, this should be simple and common practice, why only few people try to debug it, people dont care ???