Multi-region Ceph + mixed host HW?

raynearch

New Member
Dec 12, 2022
1
0
1
I'm facing a bit of a challenge with my current project and I'm hoping that someone here might have some wisdom to share.

For context, my end goal is to have a self-hosted S3 service replicated across 3 data centers (1 West coast, 2 East coast). I have 6 storage servers (2 for each DC) that are intended to all mirror each other for maximum redundancy (e.g. we should be able to have 5 out of 6 storage servers go offline all at once and still remain operational). At first, I was planning to use TrueNAS w/ Minio, but that has gone out the window for multiple reasons. I'm currently researching Ceph and SeaweedFS as possible solutions.

If I go the route of Ceph on PVE, my main concern is that I only have x2 storage servers to deploy at each data center, which does not meet the minimum requirements if I'm understanding correctly. However, I do have x3 utility servers at each DC that I was already planning on running as a HA PVE cluster. The storage servers are Supermicro SC847 (36-Bay) loaded with x24 16TB Exos SAS HDDs/ea. The utility servers are Dell R430 SFF.

I know that creating PVE clusters with mixed hardware is possible (albeit potentially ill-advised for HA), but I do not have any experience with Ceph clusters yet... which is why I'm seeking some advice.

If I was to create a cluster at each data center containing the x2 Supermicro and x3 R430s, could that be a feasible solution? Or would I just be creating problems for myself down the road?
 
not an expert but my thoughts:

5 of 6 down is never possible, you need a valid qorum that means 50%+1 up

if you heavily need storage high available in 3 dc areas, you better invest a little bit more
(then you rarly get the 5/6 situation)

if I imagine the bandwith costs between east-west (is it that cheap in the usa?) i cannot imagine that some more servers are ab budget problem

if I see big storage servers in ceph (36 spinning) I ask myself what happens if such a big machine fails, reboots etc. in my opinion its better to have many small machines. more redundancy, more load balancing, less problems on fails. of course it might get more expensive (rackspace, switchports, hardware, energy)

but again, if i look at the costs of dark fibre or 100gbit fibre etc across the country (redundant?) I see no problems in financing a decent hardware in all locations
 
the replication traffic cost will kill you. 24x16 TB. it means you will have many VMs on the Servers. dont forget - with Ceph, everything is written multiple times (in real time) to each member of the ceph OSDs.
i agree with the previous answer: better have many small Servers as 1 big.
as i understand from your description of the location, like azure, you offer services based on location (east, west). why do you want to replicate than the traffic to every location.
i would, and i didnt spend much time to think about the architecture, put a cluster in every location, the cluster is backed by the datacenters backup plan. usually a datacenter has a backup site, replicate to there. traffic cost wont kill you, the backup site are connected with big bandwith to the prod site. this is just a first thought and not a perfectly plan but in this way i would work
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!