Ceph-Cluster RAM runs full

Patrik

Member
Jul 6, 2016
15
3
23
38
Hi at all,

i have running a ceph cluster with the following package versions:

Code:
proxmox-ve: 6.1-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.6-pve1
ceph-fuse: 14.2.6-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-14
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-2
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

Hardware:
3 nodes, each node has
1 SSD for OS
16Gb Ram
3 x 2 Tb SSDs (3 OSDs)

The problem i have, is that the RAM is running full (96%) after a while.

Screenshot from 2020-05-06 07-03-38.png

So is that normal??
I thought for 3 OSDs, 16 GB RAM is enough.


thanks
best regards
 
Hi!

You can find the memory recommendations in the reference documentation.
What is the output of free -h?

Best
Dominic
 
Hi!

thanks for the answer!
So free -h looks like this:


Code:
               total        used        free           shared        buff/cache             available
Mem:           15Gi         13Gi        837Mi           88Mi            1.0Gi                   1.3Gi
Swap:          8.0Gi        307Mi       7.7Gi


best
patrik
 
You can check which processes use the memory exactly using htop or top.

The documentation linked above also contains a link to the Ceph website which has even more information about memory requirements. Check it out if you have not seen it yet.
 
Using the information of the links:
I thought for 3 OSDs, 16 GB RAM is enough.
As (useful) default you need about 12GB RAM for the bluestore backend for your 3 disks alone. This does not take into account memory for managers, monitors and backup & rebalancing. Thus, it would be beneficial to have more RAM.
 
OK thanks for the infomation, i have now upgrade each node to 48GB Ram.

But now is have a new problem. After restart the server one monitor wont start.
All OSDs are online, UP and IN.

Code:
Started Ceph cluster monitor daemon.
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fa51141e280 time 2020-05-07 18:58:31.592455
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: 1211: ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 10 in /var/lib/ceph/mon/ceph-pve-storage01/store.db/099273.sst")
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x7fa512fa24de]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list*)+0x3ce) [0x55dcfb39f8ee]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  3: (main()+0x111e) [0x55dcfb1408de]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  4: (__libc_start_main()+0xeb) [0x7fa51193409b]
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  5: (_start()+0x2a) [0x55dcfb1712fa]
May  7 18:58:31 pve-storage01 ceph-mon[3355]: *** Caught signal (Aborted) **
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  in thread 7fa51141e280 thread_name:ceph-mon
May  7 18:58:31 pve-storage01 ceph-mon[3355]: 2020-05-07 18:58:31.589 7fa51141e280 -1 /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fa51141e280 time 2020-05-07 18:58:31.592455
May  7 18:58:31 pve-storage01 ceph-mon[3355]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/kv/RocksDBStore.cc: 1211: ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 10 in /var/lib/ceph/mon/ceph-pve-storage01/store.db/099273.sst")
May  7 18:58:31 pve-storage01 ceph-mon[3355]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)

So what can i do now.
Would it suffice to recreate the montior?

best
Patrik
 
Yeah… I have experienced that also a lot. Almost any reboot of a ceph host, does kill the monitor, which had been running on that ceph node. In such a case, I do have a little action plan how to "re-create" such a monitor. It generally goes like this:

Code:
rm -rf /var/lib/ceph/mon/<ceph-node name>/*
mkdir /root/tmp
rm -f /root/tmp/*
ceph mon getmap -o /root/tmp/icephmap
ceph auth get mon. -o /root/tmp/icephkey
ceph-mon -i <ceph-node name> --mkfs --monmap /root/tmp/icephmap --keyring /root/tmp/icephkey
ceph-mon -i <ceph-node name> --public-addr <ip of cepth nework-node>:6789

Replace <ceph-node-name> with the hostname of your affected ceph node. The IP needed is the IP which is used for the ceph monitor to communicate with its peers.

You may also need to re-enable the mgr2 protocol:
Code:
ceph mon enable-msgr2
 
  • Like
Reactions: Patrik

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!