How to Trim Win Server 2022 with RDB CephFS Backed Storage

ErkDog

Member
May 10, 2020
44
3
13
43
I've got a problem that's going to turn into a big problem real fast.

I have a Ceph cluster setup. 3 systems, 3 x 18TB Mechanical drives each.

I have a 49TB volume in the pool for storage.

It says that 11TB is in use, but I formatted the drive and only 4TB is actually in use.

When I try to use every possible methodology to trim:

Optimize-Volume / Defrag -L / qm agent VMID fstrim

NOTHING frees up the space.

I get "Can not perform operation on Thinly provisioned volumes with slab size less than 8MB."

Which would be fine if it somehow, otherwise, cleared the unused space, but it doesn't.

How do I fix this, and this data is pretty important if I nuke it again, I'll have to spend about a week for systems with slow internet uploads to reseed data back and the systems will be unprotected backup wise. (This is a Veeam B&R controller ingesting backup data from far end agents.)

Thanks,
ErkDog
 
your problem description is a bit confusing, so therefore no one answers.

please describe your setup and your problem in full detail.
 
I feel like I described it pretty fully, but let me see if I can provide additional detail:

Only 7.3TB is in use on this volume: https://ss.ecansol.com/uploads/2023/03/01/ncplayer_1677660784.png

Ceph -> Pools says it's using 23.63 TB : https://ss.ecansol.com/uploads/2023/03/01/chrome_1677660813.png

OSD says each disk is using 16.37 TB : https://ss.ecansol.com/uploads/2023/03/01/chrome_1677660863.png

Which it should only be using the 7.3TB + about 500G in ISOS on a CephFS or 7.8TB.

I have discard on: https://ss.ecansol.com/uploads/2023/03/01/chrome_1677660914.png

I formatted the drive at one point as I needed a different file system.

It did NOT appear to free any of that space. When I try to run Trim or Optimize-Volume in the OS to pass up the free blocks and reclaim space I get: https://ss.ecansol.com/uploads/2023/03/01/ncplayer_1677660986.png

Or "Neither slab consolidation nor slab analysis can run if slabs are less than 8 MB"

If I try to run qm agent 102 fstrim : it times out.

My concern is that as data is created and deleted on this volume, the Ceph RDP pool is going to fill up because trim/discards are not getting properly passed up to PVE.

How do I rectify/fix/correct this?

Thanks,
Matt
 
What does it say in the summary panel of the Storage for the pool? With Ceph you have to be careful because in the summary panel you will see how much usable data is stored there. But in the main Ceph panel and I think in the pool overview, you see how much data is stored in total, which should be about 3 times the used space from the storage summary panel because the pool has a size=3 -> 3 replicas.
 
But as illustrated above I am only using 7TB but Ceph thinks 24 TB is in use.
And I think here is the misunderstanding. These panels show you the raw used space in the cluster that the 3 replicas take up.
In the tree view, you should have a storage "RBDceph" shown for each node. What does it show if you navigate there and then to the "Summary" page, where you have the space usage graph?

It should be at ~7 TB or 1/3 of the ~24 TB raw used space.
 
hmmmm well I been tied up for a few days, and yes the summary page says 9.61 TB which lines up with the current usage somewhat.

I'll let the data build up for another week and there is a subset I can safely delete to see if the usage goes down proportionately

Thanks,
Matt
 
  • Like
Reactions: aaron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!