I'm doing small 3 nodes cluster, 18-24 osd by cluster (1,6TB intel s3610 ssd or 3,2TB hstg nvme). Fast cpu frequency (10-12 cores, 3ghz intel) by node. replication x3. debian stretch/luminous bluestore and jessie/jewel filestore 2x10GB by ceph node. (ceph public and private network on same link) 2x10GB on proxmox node. (san + lan on same links, differents vlan) proxmox node also have fast cpu (3ghz), to reduce latency. I'm also using cephfs and radosgw for sharing datas in my vms, on a dedicated cluster. Small clusters because it's more simple for upgrade, and if I don't have enough storage for a specific vm, we simply move disk with proxmox. I known 2 peoples who"s have triggered this bug... also I don't known if it's have changed, but resync a vm volume/file, needed to scan all blocks on the source file.