My issue is that on Host 2 and Host 3 the 14TB HDD gets filled only to 76% whereas the 4TB HDD gets filled to 92% capacity while on Host 1 the balancing results in the 14TB and the 4TB HDDs being each filled to 78%/79% of capacity whereas - which I consider the optimal distribution. Why does it work on Host 1 but not on Host 2 and Host 3?
CEPH is never able to distribute it 100% optimally. A discrepancy of +/- 10 - 15 PGs is definitely the rule and that also applies to you.
CEPH distributes PGs, not files. A PG has around 45 GB of data. In this respect, everything is absolutely within limits and the distribution according to PG is fine. Then you would have to adjust it manually and configure the balancer differently or change the weighting of the OSD.7. If you reduce it, then the OSD.8 automatically gets more data - but you should do this slowly so that you don't fill up the other one.
That means, if I understand it correctly, that the loss of 1 disk per pool would not be catastrophic.
Yes, it will, you have a mistake in your thinking and you have to think more abstractly here.
CEPH wants to keep the replica of 3 and distributes your bulk of data across three hosts. Each of your nodes must therefore keep a complete copy of the data. You are currently distributing this across 2 OSDs (HDD). If one fails, CEPH will have to pack this data from the failed OSD somewhere else and will therefore want to pack the entire fill level from, for example, OSD.1 to OSD.9. In this scenario, you can only fill the two HDDs up to a level of 42.5% each so that the other can hold all the data in the event of a failure. But you now have 167.94% of data per node, if an HDD can hold a maximum of 100%, where should the remaining 67.94% go? CEPH will run with the OSD.9 in full ratio and then pull the emergency brake and switch the pool to read-only to protect the integrity.
If you only had one OSD of each type, then your thinking would be correct, because CEPH can no longer produce its replica and would then automatically go into degraded+undersized, but there would be no standstill because the other OSDs can not receive this data according to Crush Rule. With two OSDs per node, things have to be thought differently.
With Replica 3 and three CEPH nodes, you basically have to view one server and hide the others. If you think like this, you will understand that the data from the other HDD can never fit on the other one.
So you have currently pushed your HDD pool so hard to the limit that you can only expand it to acutely eliminate the risk of a standstill. Theoretically, you can also use noout to swap the current 4 TB drive for a larger one. However, I would recommend putting in another HDD so that you can get the space problem under control.
For precisely these reasons, I would also advise you to always use the same size and keep in mind that in your scenario, the remaining OSD in the Crush Rule MUST always absorb the loss of another one.
If you have three 4 TB disks in there and each has 50% of data, then the other two each get about 25% of data and are then at 75%. But if you have two 4 TB disks and one 14 TB disk and all three are at 50%, then the 4 TB disks still have a maximum of 2 TB free while the 14 TB disk now has to move 3.5 TB each.
But always remember that a 4 TB disk cannot store 4 TB and that the nearfull ratio is 85% and fullratio is 95% (= read-only for the affected PGs). That's why you can't park 4 TB of data here and you should always keep the 85% limit in mind.