Using pg_autoscale in Ceph Nautilus on Proxmox

Benc · Aug 22, 2019

Greetings everyone!

I have installed Ceph Nautilus on Proxmox using the pveceph repositories.
I then went on and installed the Nautilus Dashboard as well, as i intend to use the Ceph cluster for both Proxmox, and also make use of other Rados Gateways for general storage inside Ceph for a full hyperconverged setup.

I was wondering if there is any potential downsides to enabling the `pg_autoscale` feature that is new in Ceph Nautilus. I would like to use it so that i do not have to manually re-balance my PG's once i begin loading data into the ceph cluster. I would assume that there is no incompatibility but decided to post first.

Has anyone enabled these features on their clusters yet? How is it working out for you.

iankun · Aug 23, 2019

I enabled it and it worked as expected. It did reconfigure my pool OSG to a lower number, but that was expected since I over-provisioned them on purpose. When you enable that plugin, it just puts all pools into "warn" mode and tells you if you have over-provision them or have not enough. I see no harm in enabling it and then see if you get warnings or keep it in that mode and react when needed. You can completely switch it off after the fact, as well.

elmacus · Aug 23, 2019

Until you replace 1 disk autoscale works, you need to be prepared for disaster:
https://forum.proxmox.com/threads/ceph-went-down-after-reinstall-1-osd.57013/

Benc · Aug 23, 2019

elmacus said:
Until you replace 1 disk autoscale works, you need to be prepared for disaster:
https://forum.proxmox.com/threads/ceph-went-down-after-reinstall-1-osd.57013/

This looks like an issue with OSD's that were upgraded from Luminous. If this is a fresh install of PVE and Ceph Nautilus, then there should be no issue,as none of the OSD's were migrated, correct?

raymondbh · Sep 25, 2019

I've was maybe a little quik enabling autoscale on my cluster. I've just started using ceph and after autoscaling I now have 256 pgs in unknown state since the autoscale reduced the pgs significantly. How can I tell the cluster that the pg count is down?

Code:

root@ce01:~# ceph -s
  cluster:
    id:     b512a8d7-1956-4ef3-aa3e-6f24d08878cf
    health: HEALTH_WARN
            Reduced data availability: 256 pgs inactive

  services:
    mon: 3 daemons, quorum ce01,ce03,ce02 (age 2m)
    mgr: ce02(active, since 2m), standbys: ce03, ce01
    mds: cephfs:1 {0=ce02=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 40h), 6 in (since 5d)

  data:
    pools:   3 pools, 288 pgs
    objects: 24 objects, 4.8 MiB
    usage:   683 GiB used, 16 TiB / 16 TiB avail
    pgs:     88.889% pgs unknown
             256 unknown
             32  active+clean

Benc · Sep 25, 2019

Try disabling autoscaling and see if that removes the warning.

I had the same issue when my cluster had no data loaded onto it, and the warning went away once autoscale was disabled.

raymondbh · Sep 26, 2019

Have tried disabling it, no luck.
But the strange ting is that when I use the get pg_num command it answers back in the old pg_num and not the new one set by autoscaling...

raymondbh · Sep 29, 2019

Seems like my data on those pools are lost, damn. Ok read a little closer before I do something like that again!

raymondbh · Oct 1, 2019

OK, got help from the ceph mailing list. It was not the autoscale that was the problem. I had created a crush rule that broke the hole thing, after restoring the default crush rule on the pools the autoscale worked just fine, and now everything is nice and green exept for one thing:
I get this warning:

health: HEALTH_WARN
too few PGs per OSD (12 < min 30)

How can I set this value so this warning goes away?

Search

Search

Using pg_autoscale in Ceph Nautilus on Proxmox

Benc

New Member

iankun

Member

elmacus

Well-Known Member

Benc

New Member

raymondbh

Active Member

Benc

New Member

raymondbh

Active Member

raymondbh

Active Member

raymondbh

Active Member