3 node cluster (1Gbit/s NICs for LAN and WAN)

But the traffic is basically real-time, replicas can be an hour apart.

I do not know, but I reply on the constraints at hand. I suppose you are saying the speed of the NVMe is useless when it gets out/synced across 1 Gbps, but we do not know what the VMs are doing. It might be syncing crunching numbers or syncing a blockchain and for that case no CEPH, 1G NIC and occasional ZFS replication would work too. But low IOPS storage would not.:)
That was basically what I wanted to communicate ... it could work knowning the write load. A synchronous write once every second is totally fine on 1G, yet writing burst data is not, also not for replication if a lot changes.
 
The lost mail problem, with this setup, needs to solved on application level I am afraid. Then you do not depend on ZFS replication frequency. Similar with databases.
Or again, if the hard requirement is storage HA, CEPH is the only solution even if it's slow as hell. It could be also setup up in a way that you would run a small ceph pool for the no-dataloss machines and the rest via ZFS replication, yet this would be a complex setup, I like KISS setups.
 
Or again, if the hard requirement is storage HA, CEPH is the only solution even if it's slow as hell.

There's no such thing as only ... ever. :)

It could be also setup up in a way that you would run a small ceph pool for the no-dataloss machines and the rest via ZFS replication, yet this would be a complex setup, I like KISS setups.

GlusterFS for the mailserver spool, for the 3 node setup?
 
Yes, could work, too ... yet I think it'll be as slow or even slower then CEPH.

Alright, I am not saying CEPH is bad, but when you say "could" it gives it some sort of "would not be my first choice" feel. In this case in particular, actually, it would be.

My gut feeling, Gluster with distributed 3 volumes for this application specifically (holding mail) would be actually faster. Arguably you do not really care how quickly the SMTP returns 250 after the fsync() is done. Reading will be local (in any case).

But then you said it yourself ...

I like KISS setups.

Gluster is arguably simpler setup (no, I really do not care for GUI when it's that simple). But for me in the back of my mind, for the mail, I would probably want to have that mail geo-replicated (if only maybe down the road) outside of the cluster too. Whereas, I would not want be fiddling with CEPH strech cluster done right for the same.