Reclaim free space in CephFS

Apr 29, 2016
10
9
23
38
Hey guys

We are using ceph-fuse to mount a CephFS volume for Proxmox backups at `/srv/proxmox/backup/`.

Recently, I noticed that the backup volume kept running out of free space and therefore the backup jobs were failing (we had a Ceph quota of 2 TB in place on the pool for safety reasons). Here is what struck me:

Code:
root@proxmox-a:~# du -sh /srv/proxmox/backup/
969G    /srv/proxmox/backup/

root@proxmox-a:~# df -h /srv/proxmox/backup/
Filesystem      Size  Used Avail Use% Mounted on
ceph-fuse       4.5T  2.2T  2.3T  49% /srv/proxmox/backup

root@proxmox-a:~# ceph df detail
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS
    19442G     9172G       10269G         52.82        871k
POOLS:
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE      RAW USED
    backup-data         1      N/A               3000G           2176G     48.14         2344G      558031      544k       324M       129M        6528G
[...]

As you can see, I increased the Ceph quota from 2 TB to 3 TB. But that's not a sustainable solution.

This constant "growth" has been going on for a few days now and every night, when the backup job is running, the reported disk usage by ceph-fuse increases - the free space never seems to be reclaimed.

The stats reported by `du` seem about right.

Any ideas?

Here is how `/srv/proxmox/backup/` is mounted:

Code:
root@proxmox-a:~# mount | grep ceph-fuse
ceph-fuse on /srv/proxmox/backup type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
 
Last edited:
This constant "growth" has been going on for a few days now and every night, when the backup job is running, the reported disk usage by ceph-fuse increases - the free space never seems to be reclaimed.
Not quite sure what you mean by that. Backups of VMs tend to grow not shrink, so isn't usage growth normal? If you copy a backup file and remove it, does the usage change?

You could also try to use the kernel client (no quota support) to see if it has to do with fuse.
 
Hey Alwin

Thanks for your reply.

Not quite sure what you mean by that. Backups of VMs tend to grow not shrink, so isn't usage growth normal? If you copy a backup file and remove it, does the usage change?

Sorry for not being more specific in my initial post. We are only keeping x days of backups and the total capacity needed for those backups is somewhere around 1 TB - and remains more or less constant.

When I add a 100 GB file and remove it again, `du` shows the correct values throughout the process. `df`, however, shows the increase and never goes back down...

You could also try to use the kernel client (no quota support) to see if it has to do with fuse.

We were seeing issues (slow requests) when using the kernel client, that's why we moved to fuse in the first place. Furthermore, quota support was a must.
 
Thanks for your input. `sync` seems to fix the problem for a testfile:

Code:
root@proxmox-b:/srv/proxmox/backup# du -sh /srv/proxmox/backup
967G    /srv/proxmox/backup

root@proxmox-b:/srv/proxmox/backup# ceph df detail
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS
    19442G     9178G       10263G         52.79        870k
POOLS:
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE      RAW USED
    backup-data         1      N/A               3000G           2173G     48.08         2347G      557358      544k       324M       129M        6521G

root@proxmox-b:/srv/proxmox/backup# dd if=/dev/zero of=./testfile bs=1G count=10 oflag=dsync

root@proxmox-b:/srv/proxmox/backup# du -sh /srv/proxmox/backup
977G    /srv/proxmox/backup

root@proxmox-b:/srv/proxmox/backup# ceph df detail
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS
    19442G     9147G       10294G         52.95        873k
POOLS:
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE      RAW USED
    backup-data         1      N/A               3000G           2183G     48.31         2336G      559918      546k       324M       130M        6551G

root@proxmox-b:/srv/proxmox/backup# rm testfile; sync

root@proxmox-b:/srv/proxmox/backup# ceph df detail
GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS
    19442G     9178G       10263G         52.79        870k
POOLS:
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE      RAW USED
    backup-data         1      N/A               3000G           2173G     48.08         2347G      557358      544k       324M       130M        6521G

Now the question that remains is: Where does the existing difference between `du` and `df` come from? Is there a service I can restart or a journal I can rebuild which should fix that (mis)behavior?
 
Which 'df' do you mean?

Sparse files propagate incorrectly to the stat(2) st_blocks field. Because CephFS does not explicitly track which parts of a file are allocated/written, the st_blocks field is always populated by the file size divided by the block size. This will cause tools like du(1) to overestimate consumed space. (The recursive size field, maintained by CephFS, also includes file “holes” in its count.)
http://docs.ceph.com/docs/luminous/cephfs/posix/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!