Ceph maintenance question

Tmanok · Jul 17, 2021

Hi everyone,

Quick and simple set of similar questions:

Before restarting a CEPH OSD, should you mark it as "Out"?
Before restarting a monitor, should you do anything?
Before restarting a node, should you do anything?
Before restarting a manager or manager node, should you do anything?

Thanks! I'm just a little concerned that I've been working on production equipment without enough caution lately, feeling nervous.

Tmanok

aaron · Jul 19, 2021

Tmanok said:
Before restarting a CEPH OSD, should you mark it as "Out"?

Marking an OSD as out, tells Ceph that this OSD should not be part of the cluster anymore. This in turn will cause a rebalancing, as Ceph will recreate the data located on that OSD on other OSDs in the cluster.
This is only something that you want to do if you need to replace the OSD or want to destroy and recreate it.

Normally you don't need to do anything if you just restart a service (OSD, MON, MGR, MDS) as OSDs have redundancy, you might see a warning for a short time until the OSD is back up, but as long as there are copies on 2 other OSDs, restarting one OSD at a time is not a problem. The same goes for monitors as you should always have enough to form a majority (min 3. and usually 3 are enough). MGR and MDS work with fallbacks on the remaining nodes. So if you stop those, another node will take over and become active.

If you plan a longer maintenance on a node where the node might be down for some time, and you want to avoid the rebalancing of data, you can set the "noout", "norecover", "norebalance" global OSD flags for that time. The "noout" should be enough, but the others are just to be on the safe side.
The default timeout is 10 minutes. If an OSD is not back up within that time, Ceph would automatically set it to out.

Once you are done, don't forget to disable these OSD flags again to give back the self-healing possibilities that Ceph has.

spirit · Jul 19, 2021

1)

when an osd process stop, it's first go in "down" state, then after 10min (by default), it's going to "out" state.

when out state is reached, the datas are rebalanced in the cluster.

if you are doing a long maintenance (>10min), and you don't want to reblaance, you can force the "noout" flag.

2)
for monitor, you just need to check that you still have monitor quorum. (so, just restart monitor one by one, and check that you have quorum between each restart)

3) always check that cluster is healty before restart a node. (check that no others nodes osd,mon,...is also down)

4) manager is not important for data access, but you can check than another manager daemon is still running in the cluster.

rolandkim.amaro · Jun 30, 2024

Hello, I will also planning a long maintenance for one of the nodes from my cluster. When I set the flag "nout" is there no data can loss from my storage and will it not be having cause a problem with the access to other running containers and VMs?

UdoB · Jul 1, 2024

rolandkim.amaro said:
When I set the flag "nout" is there no data can loss from my storage and will it not be having cause a problem with the access to other running containers and VMs?

Yes. No problem. As long as the state in the beginning is fine and everything is up and healthy.

That's the whole point of having redundancy like "size=3,min_size=2". Check these settings now! (Easiest: via <node> --> Ceph --> Pools --> column "Size/min".) If they are different: report here...

Keep in mind that you reduce the redundancy - if another OSD gets into trouble... your whole Cluster (specifically: those VMs + Containers which are using Ceph) will stop working as no data can get written. For a long (whatever this means) maintenance this state might not be a good idea. I use it for minutes, not days.

Disclaimer: I am not a Ceph specialist...

Search

Search

Ceph maintenance question

Tmanok

Renowned Member

aaron

Proxmox Staff Member

spirit

Distinguished Member

rolandkim.amaro

New Member

UdoB

Distinguished Member