[SOLVED] ceph - anyone had experience using mon_osd_auto_mark_in=true?

scyto

Active Member
Aug 8, 2023
453
83
28
i have been having fun with my network, somtimes this results in an OSD being marked down and out.

my understading is:
  • after some period when the network is back the osd should be marked up automatically (i am unclear how long this takes)
  • but that it will never be marked as in automatically (and i understand why this is a good default if ones network is flapping constantly)
If i increase the mon_osd_report_timeout and osd_heartbeat_grace setting is there a safe balance that would allow me to use mon_osd_report_timeout and osd_heartbeat_grace?

basically I am lazy and want it to auto start when the network is good
 
I think you misunderstand what mon_osd_auto_mark_in does - it only applies to marking OSD in as soon as the OSD process starts rather than waiting until the storage is good before marking them in and up. The OSD should be automatically marked in and up as soon as the system comes back up, unless there is some other issues, there may be some timeout, not sure what the default is, it may be a ratio of how long it was out to make sure the network is stable before marking it back in but you can extend the timeout both on marking out and in.
 
I think you misunderstand what mon_osd_auto_mark_in does
quite probably, i have been using chatgpt for the first time in anger - i asked how i could get it bring up OSD after a network issue was resolved without me having to do anything - the summary it gave me said this would resovle it, yay my first ai hallucination i didn't catch

this is part of the issues i am playing with SDN - it has a habit of doing interesting things.... which result in network not coming back (to be clear the nodes never shutdwon they are running at all times - just connectivity breaks for a few mins - basically longer than 10 mins, and when it comes back i have to start and mark the OSDs as in manually.

thanks for the answer, marking as resvolved