Let's stay with host as the failure domain, as that is the usual use-case
. You should be able to transfer the explanation to a situation where the failure domain is OSD.
If you have a cluster of 3 nodes with a size=3, then each host will store a replica. That means, if a node goes down, you have 2 out of 3 replicas available. Still enough to stay operational, but not with the full redundancy. In the Ceph status page, you should see that all PGs are in state "active+undersized".
If you have a 4 node cluster, the 3 replicas are spread across them all. This is how you can achieve more space in a cluster by adding more nodes.
If one of the 4 nodes is down, you should see roughly 3/4 of PGs being in "active+undersized" state, and about 1/4 perfectly fine in "active" state. The latter PGs have all their replicas on the 3 nodes that are still running.
In such a situation, Ceph will wait for 10 minutes for the node to come back. If the OSDs are not back up running within those 10 minutes, they will be marked as "out". The result is that Ceph will recover the lost replicas on the remaining nodes. Because it should still be able to have 3 replicas and adhere to the failure domain of host which implies, that only one replicas can be stored per host.