Number of Optimal PG's higher than actual PG's

Mar 15, 2019
6
1
23
27
Hello Community,

I've the following Ceph question about PG's and OSD capacity:

1635500155378.png
1635500179879.png
1635500293327.png
1635500738368.png
1635500203209.png

As you can see. The Optimal number of PG's for my main Pool (Ceph-SSD-Pool-0) is higher than the actual PG count of 193. Autoscale is not working as far as I can see then. There are no Target settings set yet. I'm new to Ceph and would like some of your opinion on if this is expected behavior or I should set something with Target ratio.

I'm asking this because the space left before adding the latest OSD was around 650GB. Now it's only 870GB. I added 1TB, I expected over 1 TB of free space after it.

Thanks in advance!
 
Last edited:
Hi,
what Proxmox-CEPH does on its own is in most cases a very good default. Only if you have more specific requirements or performance-issues I would change settings. Consider setting Ceph-Pool size to at least 3 with 2 min size. Else the moment if one replica goes down, your pool will be readonly for some time. Also I would recommend to have at least 2 OSDs/Disks in every node. You can force a rebalance from command line in Ceph if you want it to be a bit more even. https://docs.ceph.com/en/latest/rados/operations/balancer/
But with uneven OSDs in your hosts, it might simply the fact that the nodes with only one OSD simply get more data cause CEPH uses 25% of each nodes capacity which leads to this behavior....
 
Hello itNGO,

I was expecting that. 1 host still has 1 big VM on it that needs to be migrated either to Ceph or another storage, but I wanted +1TB for it to not come into trouble adding the last OSD. I think it will exceed the 85% mark on the near full warning message.

1 host will be fitted with 2 OSD's, and 1 host doesn't have an extra SSD in it, so I have to buy another one. Maybe it helps indeed.
 
Hello itNGO,

I was expecting that. 1 host still has 1 big VM on it that needs to be migrated either to Ceph or another storage, but I wanted +1TB for it to not come into trouble adding the last OSD. I think it will exceed the 85% mark on the near full warning message.

1 host will be fitted with 2 OSD's, and 1 host doesn't have an extra SSD in it, so I have to buy another one. Maybe it helps indeed.
Did you enable compression in your Pools? Maybe it might help to get the space you need, without adding more disks.
We had good success with compression on CEPH. About 2 to 1 ratio can be expected. However, you need to "rewrite" your data after enabling compression.
 
Isn't a Poolsize of 3 and min 2 going to get the VM's replicated 1x more? This means I will run out of space if I do that.
Yes. Its just a recommendation. Cause currently you have 2 copies of your data but you configured that CEPH needs minimum 2 copies. So if a node goes down, you may loose a copy and Ceph will put the pool in read only mode. Most VMs will not like that or tolerate that very long.

On the other side, setting minimum to only 1 copy will work, but what if this last disk has an error? You will loose all data. So 2 minimum with 3 copies available is the recommendation. You can do with less, but your risk is much higher loosing data then.
 
I did not enable compression unfortunately. I have to migrate them again.. Fun ;). Is this guide and the first 2 commands sufficient to enable it? https://docs.ceph.com/en/nautilus/rados/configuration/bluestore-config-ref/#inline-compression

ceph osd pool set <pool-name> compression_algorithm lz4
ceph osd pool set <pool-name> compression_mode aggressive

-----------------

As for the min size. I get a good understanding now how it works. Because setting it to 3 is not (yet) possible with the big VM waiting to be migrated.

Setting it to 1 is only saying: You have 2 copies, 1 original where the cluster writes data and 1 copy, what is being replicated. If 1 host is down, the cluster will use the 2nd copy until the first copy is online again. With the risk that if the 2nd copy disk is erroring out, you have no VM data anymore.

Is my understanding correct?
 
Last edited:
And is having it set to 1 give me dataloss on the long term?(Not counting disk failures) Because this message is a bit unclear (proxmox-ceph-docs):

Do not set a min_size of 1. A replicated pool with min_size of 1 allows I/O on an object when it has only 1 replica, which could lead to data loss, incomplete PGs or unfound objects.
 
I did not enable compression unfortunately. I have to migrate them again.. Fun ;). Is this guide and the first 2 commands sufficient to enable it? https://docs.ceph.com/en/nautilus/rados/configuration/bluestore-config-ref/#inline-compression

ceph osd pool set <pool-name> compression_algorithm lz4
ceph osd pool set <pool-name> compression_mode aggresive

-----------------

As for the min size. I get a good understanding now how it works. Because setting it to 3 is not (yet) possible with the big VM waiting to be migrated.

Setting it to 1 is only saying: You have 2 copies, 1 original where the cluster writes data and 1 copy, what is being replicated. If 1 host is down, the cluster will use the 2nd copy until the first copy is online again. With the risk that if the 2nd copy disk is erroring out, you have no VM data anymore.

Is my understanding correct?
Yes, the 2 command are all you need. You should read about the compression_mode, but in most cases aggressive is good.

Second part, yes, sounds correct to me.
 
And is having it set to 1 give me dataloss on the long term?(Not counting disk failures) Because this message is a bit unclear (proxmox-ceph-docs):

Do not set a min_size of 1. A replicated pool with min_size of 1 allows I/O on an object when it has only 1 replica, which could lead to data loss, incomplete PGs or unfound objects.
Yes, but when there is only one copy in a "degraded" scenario, there is no way for Ceph to verify if the data is without errors. So this is never a situation you want to be in. Rule of thumb.... have at least always 2 copies of your data ready. Every situation where you only have one copy of your hot live data, should be prevented under all circumstances. It is the last of the last resort when everything fails.
 
Ok. Clear. I will play with compression and have it migrated again, see how much time it will take. Maybe this might be enough after compression to enable a 3/2 size value. Thank you itNGO :) hope to have more OSD's for the single OSD hosts soon.
 
  • Like
Reactions: itNGO
Is there a way to know when Optimal will become Actual? I thought it might take some time to start a rebalance but it has shown Optimal 256 and actual 128 for a couple of weeks now.

I understand the defaults are most likely fine and only need to be manually adjusted in very specific cases but what's the point of showing the optimal number and not applying it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!