Proxmox reports: 88% memory allocation, but no VM / CT runs - Is this a memory leak caused by Ceph?

Restarting one single OSD service (ceph-osd@42 for example) will not cause you problem since it will be down such a short time, and if some PG:s are moved they will be rebalanced when the OSD comes up again. This is one of the major points of ceph - OSD failure is a common thing. But to avoid any response from the cluster set "noout" before to avoid recovery. But remember to unset it after performing your tests.

At 60 OSD per node your memory consumption is not strange at all but wholly expected in my opinion. Each OSD would use at least 2-3 GB, preferably more (4-5).
I ended up learning this the hard way as well, CEPH OSDs can be configured to use less memory at a performance cost, but typically use 3.5GB each on my systems. Additionally, monitor processes uses 1GB and and managers use about 200MB on my nodes.

Earlier in this thread, Tapio Lehtonen said that the memory must have been OS caches or buffers, but Promox does not display these on its memory usage bar, only process-allocated memory is displayed there and not "Free" or "Available" (also note that free -mh displayed only about ~30GB of Buffers and Caches). This was the first clue to me that one of three or four things was happening:
  • CEPH OSDs - Good job Bengt Nolin
  • ZFS ARC - Good guess Vladimir Bulgaru
  • Memory Leak - This should always be the last item as it is the least common.
For anyone else experiencing an issue like this, please do as I say and not as I have done, RTFM. :)

Something else that might be helpful for users searching for high memory usage: Windows VMs with QEMU-Guest Utils installed will only report a fraction of actual memory usage even in the Proxmox GUI, however Windows uses all available RAM for caches and buffers so don't be surprised when Htop says the VM is actually using all allocated memory.

One final note on CEPH: I have found that all of my hyperconverged nodes use closer to 2GB of buffers+cache per OSD and my OSDs are quite small, putting me at about 4GB per 1TB of buffers or cache. This is very different from the OP's cache/buffers value which was about 500MB per OSD, so your milage will vary. For some reason, I frequently swap on low-memory systems while having about 4-8GB of RAM allocated to caches and buffers, so I end up purging caches and buffers and emptying my swap daily to increase the lifespan of my boot devices. On my high memory nodes however, this is never a problem. Thanks, hope this is helpful, sorry to awake an old thread but it is still very relevant.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!