Ceph-Cluster RAM runs full

Patrik · May 6, 2020

Hi at all,

i have running a ceph cluster with the following package versions:

Code:

proxmox-ve: 6.1-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.6-pve1
ceph-fuse: 14.2.6-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-14
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-2
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

Hardware:
3 nodes, each node has
1 SSD for OS
16Gb Ram
3 x 2 Tb SSDs (3 OSDs)

The problem i have, is that the RAM is running full (96%) after a while.

So is that normal??
I thought for 3 OSDs, 16 GB RAM is enough.

thanks
best regards

Dominic · May 6, 2020

Hi!

You can find the memory recommendations in the reference documentation.
What is the output of free -h?

Best
Dominic

Patrik · May 6, 2020

Hi!

thanks for the answer!
So free -h looks like this:

Code:

               total        used        free           shared        buff/cache             available
Mem:           15Gi         13Gi        837Mi           88Mi            1.0Gi                   1.3Gi
Swap:          8.0Gi        307Mi       7.7Gi

best
patrik

Dominic · May 6, 2020

You can check which processes use the memory exactly using htop or top.

The documentation linked above also contains a link to the Ceph website which has even more information about memory requirements. Check it out if you have not seen it yet.

Dominic · May 7, 2020

Using the information of the links:

I thought for 3 OSDs, 16 GB RAM is enough.

As (useful) default you need about 12GB RAM for the bluestore backend for your 3 disks alone. This does not take into account memory for managers, monitors and backup & rebalancing. Thus, it would be beneficial to have more RAM.

Patrik · May 7, 2020

OK thanks for the infomation, i have now upgrade each node to 48GB Ram.

But now is have a new problem. After restart the server one monitor wont start.
All OSDs are online, UP and IN.

Code:

Started Ceph cluster monitor daemon.
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fa51141e280 time 2020-05-07 18:58:31.592455
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: 1211: ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 10 in /var/lib/ceph/mon/ceph-pve-storage01/store.db/099273.sst")
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x7fa512fa24de]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list*)+0x3ce) [0x55dcfb39f8ee]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  3: (main()+0x111e) [0x55dcfb1408de]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  4: (__libc_start_main()+0xeb) [0x7fa51193409b]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  5: (_start()+0x2a) [0x55dcfb1712fa]
May  7 18:58:31 pve-storage01 ceph-mon[3355]: *** Caught signal (Aborted) **
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  in thread 7fa51141e280 thread_name:ceph-mon
May  7 18:58:31 pve-storage01 ceph-mon[3355]: 2020-05-07 18:58:31.589 7fa51141e280 -1 /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fa51141e280 time 2020-05-07 18:58:31.592455
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: 1211: ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 10 in /var/lib/ceph/mon/ceph-pve-storage01/store.db/099273.sst")
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)

So what can i do now.
Would it suffice to recreate the montior?

best
Patrik

budy · May 7, 2020

Yeah… I have experienced that also a lot. Almost any reboot of a ceph host, does kill the monitor, which had been running on that ceph node. In such a case, I do have a little action plan how to "re-create" such a monitor. It generally goes like this:

Code:

rm -rf /var/lib/ceph/mon/<ceph-node name>/*
mkdir /root/tmp
rm -f /root/tmp/*
ceph mon getmap -o /root/tmp/icephmap
ceph auth get mon. -o /root/tmp/icephkey
ceph-mon -i <ceph-node name> --mkfs --monmap /root/tmp/icephmap --keyring /root/tmp/icephkey
ceph-mon -i <ceph-node name> --public-addr <ip of cepth nework-node>:6789

Replace <ceph-node-name> with the hostname of your affected ceph node. The IP needed is the IP which is used for the ceph monitor to communicate with its peers.

You may also need to re-enable the mgr2 protocol:

Code:

ceph mon enable-msgr2

Patrik · May 13, 2020

@budy many thanks!

Now all works normal again.
Ceph Health_OK

Search

Search

Ceph-Cluster RAM runs full

Patrik

Member

Dominic

Proxmox Retired Staff

Patrik

Member

Dominic

Proxmox Retired Staff

Dominic

Proxmox Retired Staff

Patrik

Member

budy

Active Member

Patrik

Member