Ceph x daemons have recently crashed

Oct 11, 2020
35
2
13
37
Since last Ceph update (to current 17.2.5) we noted that every node reboot will mark OSDs from that node as crashed. However they return with the server boot normally.

I checked ceph and journalctl logs, and I did not find anything relevant about the daemons crashing (timeout, segfault, etc)

Is this something normal after the reboot (the CEPH HEALTH_WARN)? Seems not, because this is new for us.
 
I too am seeing this issue in my cluster with most of my nodes upon reboot and I too can't find anything related in the logs. Some of them don't have daemons crashing on the update. Not sure what the cause is on this one since everything works fine after the reboot.
 
CEPH OSD FLAGS

noout -- If the mon osd report timeout is exceeded and an OSD has not reported to the monitor, the OSD will get marked out. The “noout” flag tells the ceph monitors not to “out” any OSDs from the crush map and not to start recovery and re-balance activities, to maintain the replica count.

nobackfill -- If you need to take an OSD or node down temporarily, (e.g., upgrading daemons), you can set nobackfill so that Ceph will not backfill while the OSD(s) is down.

norecover -- Ceph will prevent new recovery operations. If you need to replace an OSD disk and don’t want the PGs to recover to another OSD while you are hotswapping disks, you can set norecover to prevent the other OSDs from copying a new set of PGs to other OSDs.

norebalance -- data rebalancing is suspended

nodown -- Prevent OSDs from getting marked down. Networking issues may interrupt Ceph heartbeat processes, and an OSD may be up but still get marked down. You can set nodown to prevent OSDs from getting marked down while troubleshooting the issue. If something (like network issue,etc) is causing OSDs to ‘flap’ (repeatedly getting marked down and then up again), you can force the monitors to stop the flapping by temporarily freezing their states with nodown.

pause -- Ceph will stop processing read and write operations, but will not affect OSD in, out, up or down statuses. If you need to troubleshoot a running Ceph cluster without clients reading and writing data, you can set the cluster to pause to prevent client operations.

Try setting ceph flags as per your requirements before rebooting a node in the cluster. works like a charm.

# Node maintenance

# stop and wait for scrub and deep-scrub operations

ceph osd set noscrub
ceph osd set nodeep-scrub

ceph status

# set cluster in maintenance mode with : (I had used the below to bring the entire cluster down when we were physically migarting the entire setup to a diffrent datacentre)

ceph -s (to check ceph status)
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
# ceph osd set norebalance
# ceph osd set nodown
# ceph osd set pause


UNSET FLAGS ONCE ACTIVITY IS COMPLETED.
 
Generally I set the noout flag to the cluster before rebooting a node to prevent the cluster from needing to do a lot of work on the node coming back online. The thing that is the most strange is that the daemons crash while the cluster still has the noout flag set and the node is back online.

Setting all those flags is not needed for a single node reboot since noout will prevent backfilling, recovering, and rebalance as the node is coming back. Once it returns, it will rebalance, backfill, and recover if needed but since all the OSDs return after the reboot, this process only takes seconds. I don't know if I would set nodown for a node reboot as the OSDs really are down so that might cause an issue.

Edit: Spelling
 
Last edited:
  • Like
Reactions: pvps1
Are any of you still experiencing this issue?

We recently investigated something that sounds very similar—a segmentation fault in the OSDs.

It happens when:
  • fast_shutdown is enabled (it is enabled by default)
  • BlueStore is using non-rotational drives (i.e. flash storage)
  • OSD is being shutdown
We saw the HEALTH_WARN in Ceph for all OSDs after the PVE nodes rebooted. All OSDs came back after the reboot and no impact was observed.

Ceph Bug #64373 and related Ceph Backport #66148

Based on the links, it should be fixed now or soon-ish.
 
  • Like
Reactions: Jackobli
Sure, nearly on every reboot after patching. But only on our some nodes that still have some traditional hard disks.
Dunno if the fact, that they have also a wal/db on a separate SSD is the key.
Would be nice to have that fixed.
 
Sure, nearly on every reboot after patching. But only on our some nodes that still have some traditional hard disks.
Dunno if the fact, that they have also a wal/db on a separate SSD is the key.
Would be nice to have that fixed.
interesting. It does not mention the WAL or DB but it would make sense. Are you running Bluestore? What version is your Ceph?
 
interesting. It does not mention the WAL or DB but it would make sense. Are you running Bluestore? What version is your Ceph?
Ceph 18.2.4, everything on Bluestore.

Perhaps interesting:
- It only happens on the nodes with rotating disk (SAS 10K RPM) and WAL/DB on (SATA) SSD
- The other nodes are SSD-only (SAS SSD) and are not affected.
 
Last edited:
  • Like
Reactions: weehooey-bh