We have 15 OSD hosts, and 22 OSDs. The servers physically have 2 drive bays. Of course the OSDs are not distributed perfectly evenly. Some servers have 1 OSD and some servers have 2 OSDs, but we are always adding drives to the system as time/availability allows.
OSD utilization according to the ceph dashboard is between 45-65% depending on whether the OSD is alone on a host or colocated with another.
Last week, a server with 2 OSDs had a problem resulting in a drop of 2 OSDs. With size=3 and min_size=2, a number of VMs were essentially frozen. After moving the physical drives to other OSD hosts and running ceph-volume lvm activate --all things got back to normal after a few minutes, but the drive-host distribution remains somewhat uneven.
That evening, we increased our replication to size=4, min_size=2. We are also in the progress of upgrading from PVE 7.2 to 7.3 and Ceph 17.2.4 to 17.2.5.
Even with the unbalanced OSD distribution, we now have size=4 which should be overkill for what is not a terribly large cluster. The expectation is that we can update and reboot any host without affecting RBD clients in any way. If we can get there then I'm happy with size=4.
We chose a host with a single OSD to reboot first. This was also a MON host but we have 7 monitors that were all in at the time. Our average iops during the day are 3k-10k and during the night settle around 2k. The noout flag was set prior to rebooting the host.
During the reboot, 1 OSD went down, resulting in roughly 5% of objects to become degraded as expected, and i/o across the cluster still slowed to distressingly low values and iops were showing under 100. A number of windows VMs had BSOD and required a reset even after the reboot completed and the downed OSD was brought back.
Again, this is a cluster with size=4,min_size=2 with 1 OSD down, behaving as if it were size=2.
Things should remain perfectly stable and functional with only 1 OSD down, and my aim is to achieve the same tolerance for 2 OSDs down.
Someone please tell me what I am missing and doing wrong.
OSD utilization according to the ceph dashboard is between 45-65% depending on whether the OSD is alone on a host or colocated with another.
Last week, a server with 2 OSDs had a problem resulting in a drop of 2 OSDs. With size=3 and min_size=2, a number of VMs were essentially frozen. After moving the physical drives to other OSD hosts and running ceph-volume lvm activate --all things got back to normal after a few minutes, but the drive-host distribution remains somewhat uneven.
That evening, we increased our replication to size=4, min_size=2. We are also in the progress of upgrading from PVE 7.2 to 7.3 and Ceph 17.2.4 to 17.2.5.
Even with the unbalanced OSD distribution, we now have size=4 which should be overkill for what is not a terribly large cluster. The expectation is that we can update and reboot any host without affecting RBD clients in any way. If we can get there then I'm happy with size=4.
We chose a host with a single OSD to reboot first. This was also a MON host but we have 7 monitors that were all in at the time. Our average iops during the day are 3k-10k and during the night settle around 2k. The noout flag was set prior to rebooting the host.
During the reboot, 1 OSD went down, resulting in roughly 5% of objects to become degraded as expected, and i/o across the cluster still slowed to distressingly low values and iops were showing under 100. A number of windows VMs had BSOD and required a reset even after the reboot completed and the downed OSD was brought back.
Again, this is a cluster with size=4,min_size=2 with 1 OSD down, behaving as if it were size=2.
Things should remain perfectly stable and functional with only 1 OSD down, and my aim is to achieve the same tolerance for 2 OSDs down.
Someone please tell me what I am missing and doing wrong.
Last edited: