Hi,
in a 3 node testing ceph cluster, when i shutdown down power of a node, write operation are hanging during about 30 seconds. I would like to reduce this time to 10-15 seconds if possible (i assume my network is healthy and no risk of saturation).
I tried to change values :
But no luck, write operation are restored in 30 seconds :
I don't understand why grace period of OSD_DOWN or OSD_HOST_DOWN is not reduced ?
Thanks
in a 3 node testing ceph cluster, when i shutdown down power of a node, write operation are hanging during about 30 seconds. I would like to reduce this time to 10-15 seconds if possible (i assume my network is healthy and no risk of saturation).
I tried to change values :
Code:
root@pve01:~# ceph config set osd osd_heartbeat_interval 5
root@pve01:~# ceph config set osd osd_heartbeat_grace 12
But no luck, write operation are restored in 30 seconds :
Code:
2024-12-08T15:31:58.249932+0100 mon.pve01 (mon.0) 752 : cluster [INF] osd.2 failed (root=default,host=pve03) (2 reporters from different host after 23.894888 >= grace 20.000000)
2024-12-08T15:31:58.251243+0100 mon.pve01 (mon.0) 753 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2024-12-08T15:31:58.251417+0100 mon.pve01 (mon.0) 754 : cluster [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN)
I don't understand why grace period of OSD_DOWN or OSD_HOST_DOWN is not reduced ?
Thanks