EVERY time we need to restart one of our nodes in our cluster we are faced with this HORRIFIC impact on disk I/O while the Ceph pools need to be "rebuilt". It virtually consumes all of the resources and we need to know how to prevent this. I am simply talking about issuing a 'reboot' after applying a new kernel or similar change.
After the system reports stuff like:
PGs
activating:8
activating+degraded:64
active+clean:386
active+clean+inconsistent:1
peering:53
It takes sometime HOURS to clear up from this and all the time the entire cluster performs horribly. This cannot be the way things are supposed to be. Is something not being handled correctly by the system? We have a pool with size of 3, min of 2 and NEVER reboot more than a single node at a time in a 4 node cluster.
After the system reports stuff like:
PGs
activating:8
activating+degraded:64
active+clean:386
active+clean+inconsistent:1
peering:53
It takes sometime HOURS to clear up from this and all the time the entire cluster performs horribly. This cannot be the way things are supposed to be. Is something not being handled correctly by the system? We have a pool with size of 3, min of 2 and NEVER reboot more than a single node at a time in a 4 node cluster.