Wondering if anyone else has observed this, or if I missed a memo on how to fix it (or maybe I'm doing something wrong!)
Since updating my homelab and office production server clusters to Ceph Quincy earlier this year, we get "Daemons have recently crashed" errors after doing routine cluster updates and node reboots. The crash reports are for some but not all daemons. Out of our 60 OSD's on the production cluster, we'll get 5-20 daemon crashes when performing node reboots.
After I finish rebooting all the nodes of each cluster (after kernel updates), I usually just archive the crash reports and move on with life. It doesn't seem to be impacting anything negatively other than it looks bad on the ceph status display screen.
When I do cluster maintenance (updates/reboots), I just set the "noout" flag on the OSD's then reboot nodes 1 at a time, waiting for everything to recover on ceph between reboots. Should I be doing something else?
Thanks!
Since updating my homelab and office production server clusters to Ceph Quincy earlier this year, we get "Daemons have recently crashed" errors after doing routine cluster updates and node reboots. The crash reports are for some but not all daemons. Out of our 60 OSD's on the production cluster, we'll get 5-20 daemon crashes when performing node reboots.
After I finish rebooting all the nodes of each cluster (after kernel updates), I usually just archive the crash reports and move on with life. It doesn't seem to be impacting anything negatively other than it looks bad on the ceph status display screen.
When I do cluster maintenance (updates/reboots), I just set the "noout" flag on the OSD's then reboot nodes 1 at a time, waiting for everything to recover on ceph between reboots. Should I be doing something else?
Thanks!