pg_num calculation

Mar 1, 2018

we are about to install a three node ceph cluster with 5 disks each, which means 15 OSDs, Size: 3, Min.Size: 2.

The majority of space shall be used by an rbd, a small part is going to be exported as cephfs.

Now, it is up to me to decide about the pg_num of my pools.

The ceph documentation says (commonly used values):
"Between 10 and 50 OSDs set pg_num to 1024"

If I use PGCalc, it tells me to set 512 for rbd and 32 for the cephfs pool given a 95%/5% distribution of the data and 100 Target PGs per OSD.

Now, I am a bit lost - there are recommendations from 512 to 1024, which is a wide range. I do not have the experience to decide what I should use. As far as I understand, a higher pg_num can cause a higher load to my systems and I can increase pg_num later - but not decrease it.

My intuitution tells me to set it to 512 for the rbd for the moment.
Regarding the cephfs pool: 32 seems to be a rather low number. Should I increase it or will this work?

Dear ceph experts: what would you do if you were in my place?

Thanks in advance,

So, IFF you install PVE 6 with Ceph Nautilus, you won't have to worry with PG calculations so much -- "the number of placement groups (PGs) per pool can now be decreased at any time, and the cluster can automatically tune the PG count based on cluster utilization or administrator hints."

However, IFF you install PVE 5 with Ceph Luminous, you can go with an initial PG count of 512 or 1024 depending on whether you plan on adding nodes. We started with 3 nodes and increased to 5 nodes, so now I'm below the "recommended" 100PG/OSD. No big deal, it just means I'll have to increase my PG count and re-balance twice when adding more nodes/OSDs (once for PG/PGP count and then again for the additional OSDs). With a replica of 3, think of PG count as more chances that at least 2/3 copies of the data are still available when an OSD fails.

In either case, I'd forego the CephFS implementation, even though I don't know your use case. See this post: Proxmox VE Ceph Benchmark 2018/02

If you still insist on a CephFS, you'll have to create to two pools (one for data and another for metadata) not just the one you mentioned, so that is now three RADOS pools -- see Create a CEPH Filesystem

Hope that helps.
Thanks for your response!

I now feel good about the number of PGs since I know that Nautilus does a better job, here. I will stick with 512 for the moment.

Regarding CephFS:
We just wanted to use CephFS for ISO images and Container templates, so this should be fine. Since we were discussing it anyhow, we now decided not to setup CephFS for the sake of simplicity. Our NAS will be able to provide these contents without any issues like it did it in the past.


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!