If your pool has size = 3, then each osd has (1024 * 3 / 12) = 256 placegroups.
Now you'll have to:
- add a new node with 4 osds (or add 4 osds to existing nodes), so there will be (1024 * 3 / 16) = 192 pg per osd (and this is the best way);
- change variable 'mon pg warn max per osd' to some...
Recovery is in progress. Wail until it complete.
I wonder what is the primary cause of this failure.
Maybe you didn't wait for HEALTH_OK between every step of upgrade? Or upgrade with noout set? When status become HEALTH_ERR, after reboot of last node?
This happens when you use wrong command to remove osd -
ceph osd rm osd.16
instead of
ceph osd rm 16
Have you tried:
- reboot a node containing osd.16 (with noout flag)?
- set osd.16 as lost?
Cluster is in a normal recovery state.
So where is the bottleneck? Try to find it with the atop utility.
Disks are SSDs or spinners?
Did you have a problem with flapping osds?
Your ceph network is 1Gbps?
You don't need to have more than 3 mons. If you need more storage space, you should add a node with OSDs only (without mon).
It is no good for recovery/backfill process.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.