Following setup:
SOMETIMES! if 1 node is not available, everything continues to work for about 30-60 seconds. After 30-60 seconds the status switches to 0/1 healthy (volumes for example) and to "recovering", for about 1-2 minutes and no access to io possible. VMs do not continue to run, are frozen and shared storage via cephfs is also not available. Roughly after 1-2 minutes, even if the node is still not available, everything works again.
This behavior does not always occur, but it drives me crazy and is, from my point of view, not the expected behavior.
Can someone help me out?
best & thanks
- 3 exactly same nodes
- 2x OSD with 1TB per OSD on each node
- 10 GBit CEPH Only (both networks)
- mesh
- ovs bridge
- not much load
- MON, MGR and MDS on each node
- size/min 3/2
- used space 1tb
- all servers use the same chrony config
SOMETIMES! if 1 node is not available, everything continues to work for about 30-60 seconds. After 30-60 seconds the status switches to 0/1 healthy (volumes for example) and to "recovering", for about 1-2 minutes and no access to io possible. VMs do not continue to run, are frozen and shared storage via cephfs is also not available. Roughly after 1-2 minutes, even if the node is still not available, everything works again.
This behavior does not always occur, but it drives me crazy and is, from my point of view, not the expected behavior.
Can someone help me out?
best & thanks
Last edited: