Ceph monitor space usage suddenly much higher

Jun 8, 2016
344
70
68
47
Johannesburg, South Africa
We have a small 3 node cluster where the Ceph monitor store is suddenly multitudes larger than it was previously. We typically 'systemctl restart ceph.target' when we observe Ceph packages having been updated and only schedule node restarts to apply newer kernels or Intel microcode updates.

There is relatively little I/O on this cluster and we do operate a RBD SSD cache. The monitor's store database grows aggressively during OSD rebuilds where the size now rapidly grows from 1.6 GB to 8 GB. We subsequently grew the file system which holds '/var/lib/ceph/mon/ceph-kvm6b/store.db' to avoid a situation where the monitor runs out of space.


Are others also seeing this? Were there changes somewhere recently which attributes to this sudden change?

Ceph cluster summary:
[admin@kvm6b ~]# du -sh /var/lib/ceph/mon/ceph-kvm6b/store.db
1.6G /var/lib/ceph/mon/ceph-kvm6b/store.db

[admin@kvm6b ~]# ceph -s
cluster:
id: 2a554db9-5d56-4d6a-a1e2-e4f98ef1052f
health: HEALTH_OK

services:
mon: 3 daemons, quorum kvm6a,kvm6b,kvm6c
mgr: kvm6b(active), standbys: kvm6a, kvm6c
mds: cephfs-1/1/1 up {0=kvm6a=up:active}, 2 up:standby
osd: 24 osds: 24 up, 24 in

data:
pools: 5 pools, 420 pgs
objects: 273.69k objects, 972GiB
usage: 2.87TiB used, 32.3TiB / 35.2TiB avail
pgs: 420 active+clean

io:
client: 2.97KiB/s rd, 1.36MiB/s wr, 62op/s rd, 138op/s wr

[admin@kvm6b ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
35.2TiB 32.3TiB 2.87TiB 8.14
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cephfs_data 1 3.08GiB 0.03 9.50TiB 791
cephfs_metadata 2 517KiB 0 9.50TiB 21
rbd_ssd 3 0B 0 687GiB 0
rbd_hdd 4 867GiB 8.18 9.50TiB 244616
rbd_hdd_cache 5 102GiB 12.92 687GiB 28267



As a comparative, the monitor storage requirements on a larger cluster is many times smaller:
[admin@kvm5b ~]# du -sh /var/lib/ceph/mon/ceph-1/store.db
57M /var/lib/ceph/mon/ceph-1/store.db

[admin@kvm5b ~]# ceph -s
cluster:
id: a3f1c21f-f883-48e0-9bd2-4f869c72b17d
health: HEALTH_OK

services:
mon: 3 daemons, quorum 1,2,3
mgr: kvm5d(active), standbys: kvm5b, kvm5c
mds: cephfs-1/1/1 up {0=kvm5b=up:active}, 2 up:standby
osd: 18 osds: 18 up, 18 in

data:
pools: 8 pools, 752 pgs
objects: 3.84M objects, 13.7TiB
usage: 34.3TiB used, 52.7TiB / 87.0TiB avail
pgs: 752 active+clean

io:
client: 4.72MiB/s rd, 28.1MiB/s wr, 650op/s rd, 1.11kop/s wr

[admin@kvm5b ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
87.0TiB 52.7TiB 34.3TiB 39.42
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd_ssd 0 8.34TiB 40.50 12.3TiB 2400064
cephfs_data 2 10.0GiB 0.08 12.3TiB 2574
cephfs_metadata 3 3.07MiB 0 12.3TiB 58
ec_nvme 16 486GiB 10.37 4.10TiB 130439
rbd_nvme 17 289GiB 11.02 2.28TiB 73966
ec_compr_nvme 19 3.79TiB 48.08 4.10TiB 1015897
ec_ssd 20 0B 0 24.5TiB 0
ec_compr_ssd 21 841GiB 3.24 24.5TiB 221010


Herewith a graphical representation since restarting kvm6a last week friday, after which the monitor store databases have suddenly grown by 1.6 GB:
mon-usage.jpg


PS: We are in the process of increasing the available volume sizes of the monitors in the second cluster and will thereafter restart the Ceph monitor services and ultimately the whole host to establish what causes the sudden increase.


As reference, herewith the Ceph configuration file from the cluster with the problem:
[admin@kvm6b ~]# cat /etc/ceph/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.254.1.0/24
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug osd = 0/0
filestore xattr use omap = true
fsid = 2a554db9-5d56-4d6a-a1e2-e4f98ef1052f
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd deep scrub interval = 1209600
osd pool default min size = 2
osd scrub begin hour = 22
osd scrub end hour = 5
osd scrub sleep = 0.1
public network = 10.254.1.0/24
rbd default features = 7

[mds]
mds data = /var/lib/ceph/mds/$cluster-$id
keyring = /var/lib/ceph/mds/$cluster-$id/keyring

[mds.kvm6a]
host = kvm6a

[mds.kvm6b]
host = kvm6b

[mds.kvm6c]
host = kvm6c

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon]
mon compact on start = true

[mon.kvm6a]
host = kvm6a
mon addr = 10.254.1.2:6789

[mon.kvm6b]
host = kvm6b
mon addr = 10.254.1.3:6789

[mon.kvm6c]
host = kvm6c
mon addr = 10.254.1.4:6789

[admin@kvm6b ~]# dir -h /var/lib/ceph/mon/ceph-kvm6b/store.db
total 1.6G
-rw-r--r-- 1 ceph ceph 544K Jun 13 12:11 260002.log
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260004.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260005.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260006.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260007.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260008.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260009.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260010.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260011.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260012.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260013.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260014.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260015.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260016.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260017.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260018.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260019.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260020.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260021.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260022.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260023.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260024.sst
-rw-r--r-- 1 ceph ceph 69M Jun 13 12:11 260025.sst
-rw-r--r-- 1 ceph ceph 13M Jun 13 12:11 260026.sst
-rw-r--r-- 1 ceph ceph 16 Jun 11 10:06 CURRENT
-rw-r--r-- 1 ceph ceph 37 Mar 21 09:33 IDENTITY
-rw-r--r-- 1 ceph ceph 0 Mar 21 09:33 LOCK
-rw-r--r-- 1 ceph ceph 3.0M Jun 13 12:11 MANIFEST-214523
-rw-r--r-- 1 ceph ceph 4.1K Jun 10 22:48 OPTIONS-207912
-rw-r--r-- 1 ceph ceph 4.1K Jun 11 10:06 OPTIONS-214526

 
There is relatively little I/O on this cluster and we do operate a RBD SSD cache. The monitor's store database grows aggressively during OSD rebuilds where the size now rapidly grows from 1.6 GB to 8 GB. We subsequently grew the file system which holds '/var/lib/ceph/mon/ceph-kvm6b/store.db' to avoid a situation where the monitor runs out of space.
This seems normal behavior to me for a cluster doing recovery/rebalance, but I couldn't say how much the DB should/shouldn't grow. The MON gets usually compacted on startup. Or you can do it manually.
https://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/

Another thing to try could be to run trim once in a while, as with the below setting, it should try to compact too.
mon compact on trim

Description: Compact a certain prefix (including paxos) when we trim its old states.
Type: Boolean
Default: True
http://docs.ceph.com/docs/luminous/rados/configuration/mon-config-ref/#miscellaneous

Anything in the ceph-mon logs?
 
Hi Alwin,

The 'mon compact on trim' appears to be enabled by default already. I can understand the monitor database growing during rebuild operations but the size of the database went from xx MB to 1600 MB after restarting nodes last week Friday.

Cluster is virtually completely idle and mon database folder doubles in size during manual compact:
[admin@kvm6a ~]# ceph -s
cluster:
id: 2a554db9-5d56-4d6a-a1e2-e4f98ef1052f
health: HEALTH_OK

services:
mon: 3 daemons, quorum kvm6a,kvm6b,kvm6c
mgr: kvm6b(active), standbys: kvm6a, kvm6c
mds: cephfs-1/1/1 up {0=kvm6a=up:active}, 2 up:standby
osd: 24 osds: 24 up, 24 in

data:
pools: 5 pools, 420 pgs
objects: 273.76k objects, 972GiB
usage: 2.87TiB used, 32.3TiB / 35.2TiB avail
pgs: 420 active+clean

io:
client: 1.32KiB/s rd, 460KiB/s wr, 20op/s rd, 38op/s wr


[admin@kvm6a ~]# ceph tell mon.kvm6a compact
compacted rocksdb in 6.437636 seconds


Concurrently in another session:

[admin@kvm6a ~]# while [ 1 -eq 1 ]; do du -sh /var/lib/ceph/mon/ceph-kvm6a/store.db; sleep 1; done
1.6G /var/lib/ceph/mon/ceph-kvm6a/store.db
1.6G /var/lib/ceph/mon/ceph-kvm6a/store.db
1.8G /var/lib/ceph/mon/ceph-kvm6a/store.db
2.0G /var/lib/ceph/mon/ceph-kvm6a/store.db
2.3G /var/lib/ceph/mon/ceph-kvm6a/store.db
2.5G /var/lib/ceph/mon/ceph-kvm6a/store.db
2.7G /var/lib/ceph/mon/ceph-kvm6a/store.db
3.0G /var/lib/ceph/mon/ceph-kvm6a/store.db
1.6G /var/lib/ceph/mon/ceph-kvm6a/store.db
1.6G /var/lib/ceph/mon/ceph-kvm6a/store.db


We've scheduled to restart nodes in our larger cluster to apply kernel updates and will report back whether or not those monitor databases also grow thereafter...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!