ceph thin provisioning for lxc's not working as expected?

lifeboy

Renowned Member
I have an LXC that is provisioned with a 100GB boot drive using ceph RBD storage. However, see the following:

Code:
~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd10       98G  8.8G   85G  10% /

This is in the running container.

Checking the disk usage in ceph however, claims the whole volume is basically used.

Code:
FT1-NodeA:~# rbd du speedy/vm-192-disk-2
NAME           PROVISIONED  USED 
vm-192-disk-2      100 GiB  97 GiB

Why is this? There doesn't seem to be a tool with which to reclaim the unused blocks for an lxc...?
 
Is the container issuing TRIM commands? Can you try running pct fstrim <CTID> and see if that helps?
 
Last edited:
You could try running fstrim -a from within the container, does that work? Not sure if that should change anything though, could you post your container config?
 
Last edited:
You could try running fstrim -a from within the container, does that work? Not sure if that should change anything though, could you post your container config?
Of course that gives the same result. For some reason the container believes that the storage doesn't support trimming, i.e. it's not thin provisioned. However, some other volumes on the same ceph storage pool are completely ok with trimming.

Could there be something that's set in the container config that prevents this?

Code:
~# cat /etc/pve/lxc/192.conf
arch: amd64
cores: 4
features: nesting=1
hostname: productive
memory: 8192
mp0: speedy:vm-192-disk-3,mp=/home/user-data/owncloud,backup=1,size=1900G
nameserver: 8.8.8.8
net0: name=eth0,bridge=VLAN11,firewall=1,gw=192.168.142.254,hwaddr=86:6B:32:CE:F0:D3,ip=192.168.142.101/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: speedy:vm-192-disk-2,size=100G
searchdomain: co.za
swap: 0
unprivileged: 1
 
Of course that gives the same result.
Ye, I figured as much.

I think the issue here is that either your container is unprivileged, or that you have a mountpoint (or both).
 
Ye, I figured as much.

I think the issue here is that either your container is unprivileged, or that you have a mountpoint (or both).
Does it mean that if you have a mountpoint (over and above the boot drive), thin-provisioning doesn't work?

Code:
~# cat /etc/pve/lxc/192.conf
arch: amd64
cores: 4
features: nesting=1
hostname: productive
memory: 8192
nameserver: 8.8.8.8
net0: name=eth0,bridge=VLAN11,firewall=1,gw=192.168.142.254,hwaddr=86:6B:32:CE:F0:D3,ip=192.168.142.101/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: speedy:vm-192-disk-2,size=100G
mp0: speedy:vm-192-disk-3,mp=/home/user-data/owncloud,backup=1,size=1900G
searchdomain: co.za
swap: 0
unprivileged: 1

In the above, the mp0 shows as thin-provisioned, but the rootfs does not. Can I do this differently to make both properly thin provisioned?
 
Does it mean that if you have a mountpoint (over and above the boot drive), thin-provisioning doesn't work?

Code:
~# cat /etc/pve/lxc/192.conf
arch: amd64
cores: 4
features: nesting=1
hostname: productive
memory: 8192
nameserver: 8.8.8.8
net0: name=eth0,bridge=VLAN11,firewall=1,gw=192.168.142.254,hwaddr=86:6B:32:CE:F0:D3,ip=192.168.142.101/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: speedy:vm-192-disk-2,size=100G
mp0: speedy:vm-192-disk-3,mp=/home/user-data/owncloud,backup=1,size=1900G
searchdomain: co.za
swap: 0
unprivileged: 1

In the above, the mp0 shows as thin-provisioned, but the rootfs does not. Can I do this differently to make both properly thin provisioned?
I tried it now locally, I could not get the container to issue TRIM commands as long as it was unprivileged, as soon as I changed to a privileged container it worked. I think it is the kernel that doesn't let an unprivileged container issue TRIM commands to block devices (which Ceph RBD is).
 
I tried it now locally, I could not get the container to issue TRIM commands as long as it was unprivileged, as soon as I changed to a privileged container it worked. I think it is the kernel that doesn't let an unprivileged container issue TRIM commands to block devices (which Ceph RBD is).

I don't think it's a good idea to run privileged containers for clients, not? If a UID matches one of the host's UIDs that has rights to locations a client should not have access to, it may create a big problem...
 
I don't think it's a good idea to run privileged containers for clients, not? If a UID matches one of the host's UIDs that has rights to locations a client should not have access to, it may create a big problem...
Yes, it also should be ran from the host, on second thought. I just thought they were somehow related, but they shouldn't be - a little bit of confusion from my part sorry.

fstrim: /: FITRIM ioctl failed: Operation not permitted
This output seems a bit weird to me. Is this the whole output of pct fstrim <CTID> running on the host? Because it should try to trim the directory /var/lib/lxc/<CTID>/rootfs/ and not /.

edit: Could you also post your storage.cfg ?
 
Last edited:
Yes, it also should be ran from the host, on second thought. I just thought they were somehow related, but they shouldn't be - a little bit of confusion from my part sorry.


This output seems a bit weird to me. Is this the whole output of pct fstrim <CTID> running on the host? Because it should try to trim the directory /var/lib/lxc/<CTID>/rootfs/ and not /.

:redface: Of course, the command has to run on the node on which the container is running...!

Code:
~# pct fstrim 192
/var/lib/lxc/192/rootfs/: 88.9 GiB (95446147072 bytes) trimmed
/var/lib/lxc/192/rootfs/home/user-data/owncloud: 1.6 TiB (1795599138816 bytes) trimmed

However, when I ask rbd for the stats, I get:

Code:
~# rbd du speedy/vm-192-disk-2
NAME           PROVISIONED  USED 
vm-192-disk-2      100 GiB  75 GiB

In the container however, I still see:

Code:
/dev/rbd10       98G  8.9G   84G  10% /

Does it take time to reflect correctly?

With qemu guests the trimming happens periodically (i.e. in Window is once a week). I support if one wants that to run regularly, it has to be scripted, right? How would that work with HA then, since a container could be running on a different node at some point in the future?

edit: Could you also post your storage.cfg ?

I guess it doesn't matter anymore now.
 
Last edited:
Ceph RBD has some peculiarities with regards to trimming, especially if the sector size of your disk isn't aligned with the RBD object size.

Could you try using the --exact flag with ceph rbd? It should give a more accurate number of used space:
Code:
rbd du --exact speedy/vm-192-disk-2
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!