Using pg_autoscale in Ceph Nautilus on Proxmox

Benc

New Member
Aug 22, 2019
28
10
3
Greetings everyone!

I have installed Ceph Nautilus on Proxmox using the pveceph repositories.
I then went on and installed the Nautilus Dashboard as well, as i intend to use the Ceph cluster for both Proxmox, and also make use of other Rados Gateways for general storage inside Ceph for a full hyperconverged setup.

I was wondering if there is any potential downsides to enabling the `pg_autoscale` feature that is new in Ceph Nautilus. I would like to use it so that i do not have to manually re-balance my PG's once i begin loading data into the ceph cluster. I would assume that there is no incompatibility but decided to post first.

Has anyone enabled these features on their clusters yet? How is it working out for you.
 
I enabled it and it worked as expected. It did reconfigure my pool OSG to a lower number, but that was expected since I over-provisioned them on purpose. When you enable that plugin, it just puts all pools into "warn" mode and tells you if you have over-provision them or have not enough. I see no harm in enabling it and then see if you get warnings or keep it in that mode and react when needed. You can completely switch it off after the fact, as well.
 
I've was maybe a little quik enabling autoscale on my cluster. I've just started using ceph and after autoscaling I now have 256 pgs in unknown state since the autoscale reduced the pgs significantly. How can I tell the cluster that the pg count is down?

Code:
root@ce01:~# ceph -s
  cluster:
    id:     b512a8d7-1956-4ef3-aa3e-6f24d08878cf
    health: HEALTH_WARN
            Reduced data availability: 256 pgs inactive

  services:
    mon: 3 daemons, quorum ce01,ce03,ce02 (age 2m)
    mgr: ce02(active, since 2m), standbys: ce03, ce01
    mds: cephfs:1 {0=ce02=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 40h), 6 in (since 5d)

  data:
    pools:   3 pools, 288 pgs
    objects: 24 objects, 4.8 MiB
    usage:   683 GiB used, 16 TiB / 16 TiB avail
    pgs:     88.889% pgs unknown
             256 unknown
             32  active+clean
 
Try disabling autoscaling and see if that removes the warning.

I had the same issue when my cluster had no data loaded onto it, and the warning went away once autoscale was disabled.
 
Have tried disabling it, no luck.
But the strange ting is that when I use the get pg_num command it answers back in the old pg_num and not the new one set by autoscaling...
 
Seems like my data on those pools are lost, damn. Ok read a little closer before I do something like that again!
 
OK, got help from the ceph mailing list. It was not the autoscale that was the problem. I had created a crush rule that broke the hole thing, after restoring the default crush rule on the pools the autoscale worked just fine, and now everything is nice and green exept for one thing:
I get this warning:

health: HEALTH_WARN
too few PGs per OSD (12 < min 30)

How can I set this value so this warning goes away?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!