One node of our ceph 17.2.6 cluster went down.
We don't know why, but at the same time 3 other OSDs on other Nodes went down as well.
When we try to restart the osd via the web page, then we only get a error message that the start of the service failed.
We get this errors in the logs:
So we have to perform these commands to start the service
Maybe the "systemmctl daemon-reload" is overkill, but here is my question.
Can you please add these extra commands behind the "start" button in the osd GUI ? Because when i press the button "start", I don't care how often the system itself tried to restart the OSD.
Or edit the service file of the OSDs, so the restart is not so frequent or it could try to restart the service without limits.
We don't know why, but at the same time 3 other OSDs on other Nodes went down as well.
When we try to restart the osd via the web page, then we only get a error message that the start of the service failed.
We get this errors in the logs:
Code:
Sep 27 13:38:40 prox4 systemd[1]: ceph-osd@5.service: Start request repeated too quickly.
Sep 27 13:38:40 prox4 systemd[1]: ceph-osd@5.service: Failed with result 'exit-code'.
Sep 27 13:38:40 prox4 systemd[1]: Failed to start ceph-osd@5.service - Ceph object storage daemon osd.5.
So we have to perform these commands to start the service
Bash:
root@prox3:~# systemctl daemon-reload
root@prox3:~# systemctl reset-failed ceph-osd@4
root@prox3:~# systemctl start ceph-osd@4
Maybe the "systemmctl daemon-reload" is overkill, but here is my question.
Can you please add these extra commands behind the "start" button in the osd GUI ? Because when i press the button "start", I don't care how often the system itself tried to restart the OSD.
Or edit the service file of the OSDs, so the restart is not so frequent or it could try to restart the service without limits.