Max number of hosts recommended for Ceph and PVE hyperconverged

velocity08

Active Member
May 25, 2019
246
16
38
48
Hi Team

im investigating using Ceph and PVE hyperconverged, reading over the documentation there are some anecdotal numbers for small deployments, what consists of a small deployment is this the minimum 3 hosts?

whats the max recommendation for number of hosts that can be run in a cluster?

ive seen 6 hosts used in pve performance tests.

what are we safe to go up to and what sort of overhead will we see for ceph as we add more hosts to a cluster?

at what point should we be considering splitting Ceph from PVE to its own storage only platform ?

””Cheers
Gerardo
 
Ceph really has no limits, and becomes better as it grows performance / reliability wise.

Proxmox with the old Corosync the number that use to float around was 16, however with 6.x and Corosync 3.x this limit seems to have been removed / increased alot. Do you have an exact number of total servers both compute / ceph your looking to have as an end goal?

Also it can depend alot on your network, the more hosts the more important it is to have a high quality low latency sync network.
 
Ceph really has no limits, and becomes better as it grows performance / reliability wise.

Proxmox with the old Corosync the number that use to float around was 16, however with 6.x and Corosync 3.x this limit seems to have been removed / increased alot. Do you have an exact number of total servers both compute / ceph your looking to have as an end goal?

Also it can depend alot on your network, the more hosts the more important it is to have a high quality low latency sync network.

I don’t have a specific number in mind I’m looking into best fit and costing compared to other solutions.

we are using vsan on our VMware deployment and thinking what’s a good equivalent for ProxMox.

have been looking at linbit drbd and other solutions like truenas which would would use the iscsi zfs plugin but it has some limitations with only 1 initiator (no multipath) and is scale up not scale out so for the specific project may not be suitable.

also had some concerns at scale with Ceph reading about a major outage with digital ocean back in 2018 which knocked them out for days.

https://status.digitalocean.com/incidents/8sk3mbgp6jgl

Any thoughts on the above?

“”Cheers
G
 
I don’t have a specific number in mind I’m looking into best fit and costing compared to other solutions.

we are using vsan on our VMware deployment and thinking what’s a good equivalent for ProxMox.

have been looking at linbit drbd and other solutions like truenas which would would use the iscsi zfs plugin but it has some limitations with only 1 initiator (no multipath) and is scale up not scale out so for the specific project may not be suitable.

also had some concerns at scale with Ceph reading about a major outage with digital ocean back in 2018 which knocked them out for days.

https://status.digitalocean.com/incidents/8sk3mbgp6jgl

Any thoughts on the above?

“”Cheers
G

Every piece of software is going to have bugs, alot has been learnt and changed since 2018. They was hit by a few things in rapid concession and maybe had their RAM set lower than suggested on the OSD nodes so was hit by the OOM during recovery which would have delayed things even further.

CEPH is used by many large enterprise companies and in recent years has had alot of money and extra resources put into it, and I would say is a much more stable product than it may have been in the past.

As previously said your node limit it set by corosync in Proxmox, CEPH does not have such a form of limit as uses its only internal message system with the MON's.

If you expect your CEPH enviornment to grow well above the old 16 node limit then probably worth keeping your CEPH environment separate, the new web GUI in CEPH does most what Proxmox GUI allows in CEPH so you wont really use any big features from not running all Proxmox.
 
Every piece of software is going to have bugs, alot has been learnt and changed since 2018. They was hit by a few things in rapid concession and maybe had their RAM set lower than suggested on the OSD nodes so was hit by the OOM during recovery which would have delayed things even further.

CEPH is used by many large enterprise companies and in recent years has had alot of money and extra resources put into it, and I would say is a much more stable product than it may have been in the past.

As previously said your node limit it set by corosync in Proxmox, CEPH does not have such a form of limit as uses its only internal message system with the MON's.

If you expect your CEPH enviornment to grow well above the old 16 node limit then probably worth keeping your CEPH environment separate, the new web GUI in CEPH does most what Proxmox GUI allows in CEPH so you wont really use any big features from not running all Proxmox.

SG90 thanks for your input I guess I’m coming from past poor experiences with distributed storage in the past and it’s made me hesitant.

normally we run hosts with 256 GB ram and dual 10 GB ethernet dedicated storage network.

vsan only uses 1 x 10GB port but can lacp multiple ports for throughout.

do you recommend any specific config based on experience?

how are you finding real world performance in your environment?

I’m willing to explore this a little further with a new 4 node cluster based on Epic 28 or 32 core CPU based on nvme more than likely Supermicro.

can you run raw storage or what format are you running for vm disks?

looking forward to more of your feedback and input :)

“”Cheers
G
 
SG90 thanks for your input I guess I’m coming from past poor experiences with distributed storage in the past and it’s made me hesitant.

normally we run hosts with 256 GB ram and dual 10 GB ethernet dedicated storage network.

vsan only uses 1 x 10GB port but can lacp multiple ports for throughout.

do you recommend any specific config based on experience?

how are you finding real world performance in your environment?

I’m willing to explore this a little further with a new 4 node cluster based on Epic 28 or 32 core CPU based on nvme more than likely Supermicro.

can you run raw storage or what format are you running for vm disks?

looking forward to more of your feedback and input :)

“”Cheers
G

On the network side people will always push towards 25Gbps+ now due to the lower latency, however if you already have a big investment in 10Gbps then Dual 10Gbps should do, depending on how dense your looking to go on each node storage wise? You say NVME so is your plan to be fully NVME based or will you have a mix of NVME / SAS / SATA?

On the storage front, Proxmox can create VM disks as individual RBD's on CEPH, this will give you the best performance over going via something like ISCSI or CephFS, as reduces the overheads that the others add into any form of I/O requests.

I run a full SATA based CEPH enviornment that runs non Proxmox, which I have attached to a Proxmox cluster via KRBD and have 0 issues or performance problems from what I would expect to get out of a SATA based system. I am also using 10Gbps connectivity.

But yes with CEPH the more RAM you can give it specially with Bluestore and recovery the better, also having a decent CPU (clock speed over more cores)
 
On the network side people will always push towards 25Gbps+ now due to the lower latency, however if you already have a big investment in 10Gbps then Dual 10Gbps should do, depending on how dense your looking to go on each node storage wise? You say NVME so is your plan to be fully NVME based or will you have a mix of NVME / SAS / SATA?

On the storage front, Proxmox can create VM disks as individual RBD's on CEPH, this will give you the best performance over going via something like ISCSI or CephFS, as reduces the overheads that the others add into any form of I/O requests.

I run a full SATA based CEPH enviornment that runs non Proxmox, which I have attached to a Proxmox cluster via KRBD and have 0 issues or performance problems from what I would expect to get out of a SATA based system. I am also using 10Gbps connectivity.

But yes with CEPH the more RAM you can give it specially with Bluestore and recovery the better, also having a decent CPU (clock speed over more cores)

only saying 10GB (Sfp+ fibre) as we already have an existing investment in switches but we could easily purchase some more larger capacity switches.

The current 10GB Sfp+ bidirectional is 20 GB and if using lacp we can bind and get 40 GB from a dual 10 GB card.

can Ceph work in this way?

just food for thought.

the new Supermicro servers are all NVME based for the CPU type we are looking at so we would need to change cpu types to accomodate SATA drives.

sata would work out cheeper for capacity and when looking at multiple sata drives per host each with a 6GB Backplane would be plenty of throughout, especially if there are nvme cache disks in front.

just curious how many cache disks and osd drives as a min-to kick off a Ceph cluster or am I not understanding it correctly?

from my first round of reading I can see we will need a min of 3 replicas 1 per host for OSD is this correct?

i’ll need to do a little more reading it’s still early days atm.

“”Cheers
G
 
only saying 10GB (Sfp+ fibre) as we already have an existing investment in switches but we could easily purchase some more larger capacity switches.

The current 10GB Sfp+ bidirectional is 20 GB and if using lacp we can bind and get 40 GB from a dual 10 GB card.

can Ceph work in this way?

just food for thought.

the new Supermicro servers are all NVME based for the CPU type we are looking at so we would need to change cpu types to accomodate SATA drives.

sata would work out cheeper for capacity and when looking at multiple sata drives per host each with a 6GB Backplane would be plenty of throughout, especially if there are nvme cache disks in front.

just curious how many cache disks and osd drives as a min-to kick off a Ceph cluster or am I not understanding it correctly?

from my first round of reading I can see we will need a min of 3 replicas 1 per host for OSD is this correct?

i’ll need to do a little more reading it’s still early days atm.

“”Cheers
G

It is not advised to use the inbuild cache functions within CEPH, they are being slowly EOL and yet to be replaced.

Your need to decided based on your storage requirements what you need, you can have multiple pools of storage, so you could have a NVME pool which would use the NVME's and a SATA pool that would use the SATA disk. When you create a disk within Proxmox it would then be up to you to decide on which pool that disks needs to be placed on.

Min to get going with 3 way replica is 3 servers, however if you was to loose one host you would have no redudancy, ideally the min you want to start with is 4, then if one server goes down the remaining 3 can still rebuild and bring back the 3 way replica fully.
 
It is not advised to use the inbuild cache functions within CEPH, they are being slowly EOL and yet to be replaced.

Your need to decided based on your storage requirements what you need, you can have multiple pools of storage, so you could have a NVME pool which would use the NVME's and a SATA pool that would use the SATA disk. When you create a disk within Proxmox it would then be up to you to decide on which pool that disks needs to be placed on.

Min to get going with 3 way replica is 3 servers, however if you was to loose one host you would have no redudancy, ideally the min you want to start with is 4, then if one server goes down the remaining 3 can still rebuild and bring back the 3 way replica fully.

Yes thank you agreed a min of 4 nodes :)
Unfortunately the chassis for hyperconverged are pretty much 1 type of storage in the new ones from my current research.

I can see where a seperate Ceph cluster would benefit from multiple storage tiers or even older servers that can accomodate for multiple sas or sata drives would be able to accomodate the multi tier demand.

Ok back to more research, thank you.

“”Cheers
G
 
  • Like
Reactions: sg90

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!