Hello,
I have a three node proxmox/ceph cluster. Each node has two nvme as ceph osd. Replica is 3. Network is 40gb.
OSD are 60% full.
Now one osd breaks. Considering I have followed all guidelines (replica 3, min 2, fast network and so on) I expected that I have no problems.
But ceph on the server with one disk broken and only one working starts filling the remaining disk with the third replica.
Now the osd is full and ALL ceph is down (it does not accept writes, VMs are blocked and so on).
I think it is an unaccetable behaviour: in a fully redundant cluster one broken disk put the cluster on knees.
What can I do?
Thanks,
Mario
I have a three node proxmox/ceph cluster. Each node has two nvme as ceph osd. Replica is 3. Network is 40gb.
OSD are 60% full.
Now one osd breaks. Considering I have followed all guidelines (replica 3, min 2, fast network and so on) I expected that I have no problems.
But ceph on the server with one disk broken and only one working starts filling the remaining disk with the third replica.
Now the osd is full and ALL ceph is down (it does not accept writes, VMs are blocked and so on).
I think it is an unaccetable behaviour: in a fully redundant cluster one broken disk put the cluster on knees.
What can I do?
Thanks,
Mario