Ceph constantly adding PG

Nhoague · Monday at 18:41

Hello!

Quick recap: I have 5 hosts, 5 x 4TB (enterprise SSD, Micron Pro) per host with Ceph on its own 10Gbp network. All has been running great for about 6 months.

Last night I had a server host kernel panic. I rebooted the host. All came up just fine, cluster returned to HEALTH_OK within minutes, and the rebalancing begun. Now I think it had started to rebalance prior to me getting to the host, but when the existing PGs came alive on the rebooted host, the rebalance appeared to go pretty quick.

Almost 12 hours later, and its still rebalancing. However, I started with 256 PG, and now its up to 344.

Watching the logs, I don't see any issues about hardware failure, or stuck PG. Would that status here show me a stuck PG? Is it just doing its thing, and will continue to grow PG until it feels it is happy? Why was 256 ok for the longest time, but last night started growing? Did it just not want to / need to?

One thing that to note, the misplaced objects, will drop to about 427k, and then something happens and its back to 490k and works its way down.

Much appreciated for your input!

Nhoague · Monday at 19:06

Update: I was able to catch it when the PG jumped another one and this is in the logs.

Appears to me it just wants to make more PG? <shrug>

Thanks!

alexskysilk · Monday at 19:24

check ceph balancer status. it probably increased the pg count from 256 to 512, and you're seeing it doing its thing in real time.

Nhoague · Monday at 19:31

I see this:

root@PVE01:~# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000217",
"last_optimize_started": "Mon Jun 22 11:29:39 2026",
"mode": "upmap",
"no_optimization_needed": true,
"optimize_result": "Too many objects (0.055091 > 0.050000) are misplaced; try again later",
"plans": []
}
root@PVE01:~#

Ideas? Thank you!

alexskysilk · Monday at 19:39

sorry wrong feature to check

ceph osd pool autoscale-status

Nhoague · Monday at 19:47

You just gave me some relief! I'm sitting here for hours watching it saying how high is it going to go!? You are right:

root@PVE01:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
.mgr 226.5M 3.0 89424G 0.0000 1.0 1 on False
CEPH01 11037G 3.0 89424G 0.3703 1.0 512 on False
root@PVE01:~#

Interesting it doesnt show in the GUI the 512 PG, it still only shows 256. Maybe when its done? I dunno?! haha

Ok so at best, it is doing its thing and I just gotta trust it!

Nhoague · Monday at 19:50

This is easier to read:

guruevi · 2026-06-23T04:08:47+0200

The Ceph auto scale shows that it is trying to scale to 512 but for a variety of reasons it is not increasing the pg automatically, probably because you have too few OSD for it to do that. You can probably manually increase it to 512 or increase the PG per OSD setting.

Nhoague · 2026-06-23T15:52:25+0200

Ok, it has reached 512 PG in the pools, optimal still shows 256. But, it just feels like its cycling. It hasn't progressed or these havent gone down?

How can I throttle the rebalance, as it is affecting VM performance a bit. Or restart it so it can get past this?

Now, within minutes its showing this:

But, I bet it drops back to 479. Why?

Any input is appreciated! Thank you!

Nhoague · 2026-06-23T16:01:05+0200

Ugh why?!

Is it going back through all of the PGs?

aaron · 2026-06-23T16:21:17+0200

This looks a lot like the autoscaler doing its thing. Was there any config change before all of this?

Check the <NODE> → Ceph → Pool menu to see what is configured there.
Possible reasons could be that one of the "target_*" options was set. Or that the pool grew in size large enough, to warrant a change in the number of PGs. If no "target_*" is set, the autoscaler will decide upon the current usage of the pool if another pg_num would be better.

If the pg_num changes, and in this case, increases, the data is split into the new larger number of PGs. That affects the data distribution of the CRUSH algorithm. Therefore you will see PGs backfilling and waiting to be backfilled. Which is part of the process of being moved to another OSD.

But all PGs are active and none are in any reduced state. So from a data availability and redundancy point of view, everything is absolutely fine and Ceph is doing what it is meant to do

Nhoague · 2026-06-23T17:23:09+0200

Hello! There were no config changes that I made, however, I had a node kernel panic and I had to reboot it. By the time I got to it, probably 45 min, I suspect the other nodes started rebalancing? So then when this node came online, is it basically rebuilding as a RAID would? From beginning -> end? That reboot or the rebuilding caused the PG to grow to 512 from 256. Is it fighting itself cause optimal is still 256?

Node - Ceph - Pool shows:

If I click on the CEPH01, target is not set.

I do remember when I built this cluster, it was 64, then after restoring a few VMs to it, it went to 128, and both numbers in pools matched.

I do agree, overall data redundancy and availability is good! However, slower, to be expected.

I just wish ceph would tell me I'm fine and be patient! haha Thank you for your answer, it is relieving!

Nhoague · 2026-06-23T17:58:13+0200

It must have heard us talking about it, it just took off and jumped a few PG within an hour!

Nhoague · 2026-06-23T20:04:53+0200

Thank you everyone who replied! It's full and back to normal!

I keep trying to think of ceph in terms of a RAID in that, why did it rebuild 33 TB when only one node was affected, but the PG really clicked that it rebuilt the entire cluster.

Just ceph doing what ceph wants to do. Dang this lil experience proved to be just that. Pretty freaking cool! When I was building this we tested the theoretical things, pulling power, pulling disks, etc. But never really thought of "expanding the PG" haha Makes total sense.

Nhoague · 2026-06-23T20:18:26+0200

On a side note, while geeking out ... Take a look at this:

During rebuild:

Using SNMP to watch my CEPH switches, I've seen them hit 2G before, but never really hammering them. We have Micron 5400 Pro disks, and our performance is great! Just curious what kind of operations yall are doing that saturates a 10G network!?

Normal operation:

Thanks again!

aaron · 2026-06-23T21:09:10+0200

Interesting. Most likely caused by some changes in the space usage of the pool. Even though the ceph osd pool autoscale-status output posted is useless as it is not formatted as code, I suspect that there is not target_ratio configured for the pool.
Given that you have 25 OSDs and one pool (we can ignore the .mgr with its 1 PG), you most likely have around 60 PGs per OSD right now? Visible in the OSD sub menu.

If you set the target_ratio of the CEPH01 pool to any value (only one, therefore always 100%), but I would suggest 1.0, you tell the autoscaler that you expect this pool to consume all the space in the pool. Then it can make a better calculation regarding how many PGs are best and do one final rebalance without having to rely on the space usage of the pool.

It is possible that the optimal number of PGs is too close to the current one, and it won't change anything by itself. Then you can manuall set the number of PGs on the pool manually to the recommended optimal one.

Having a good number of PGs per OSD can improve overall performance and recovery speed in case an OSD fails.

If the number of OSDs changes, the autoscaler will calculate a new optimal pg_num. Or if you have another pool and configure the target_ratios of both accordingly to your expectations.

Nhoague · 2026-06-23T22:47:46+0200

Yea dont judge, haha how do you paste as code? I try clicking the inline code, or the code formatter, and I paste and it looks like caca. <shrug>

Nvmnd ... I figured it out. Bam!

root@PVE01:~# ceph osd pool autoscale-status

POOL      SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK

.mgr    228.1M                3.0        89424G  0.0000                                  1.0       1              on         False

CEPH01  10974G                3.0        89424G  0.3682                                  1.0     512              on         False

root@PVE01:~#

Yes, correct ~60 PG / host!

All makes sense, kinda, but I'm definitely not inclined to make any changes now that its happy!

Ceph constantly adding PG

Nhoague

Renowned Member

Nhoague

Renowned Member

alexskysilk

Distinguished Member

Nhoague

Renowned Member

alexskysilk

Distinguished Member

Nhoague

Renowned Member

Nhoague

Renowned Member

guruevi

Renowned Member

Nhoague

Renowned Member

Nhoague

Renowned Member

aaron

Proxmox Staff Member

Nhoague

Renowned Member

Nhoague

Renowned Member

Nhoague

Renowned Member

Nhoague

Renowned Member

aaron

Proxmox Staff Member

Nhoague

Renowned Member

We value your privacy