Ceph with 1 server, then adding a second and third

TiME

New Member
Apr 25, 2014
18
0
1
I'd like to know if it's possible to start a Ceph cluster with just 1 node, then adding a second one later and at last a third one? I know it wouldn't have any redundancy, but running 3 nodes from the start isn't suitable in my case. I think the problem would be to edit the pool config and pc_num later on (or start with pg_num 150 for 3 nodes even though I'd only have one in the beginning?). Could anyone tell me if this is possible and if so, how? The wiki only describes the setup with a 3 node cluster, but not how to add a fourth one etc.
 
makes no sense but you can start with one node, just to all steps just on one node.
 
Tom, thanks for your reply. I know it doesn't make much sense, but the reason I want to do this is because 3 nodes would stay idle for too long and just consume power and money, so I only want to start adding nodes when the first one isn't idle anymore, until we finally reach 3 nodes and can enable full HA. What kind of pool settings would I use in such a scenario and/or how can I adjust them later? Or would you recommend to use local storage only until we reach 3 nodes and then hotplug new HDDs, add the OSDs there and journal on SSDs and then migrate the images to Ceph, format the old and now empty local storage HDDs and then also add them as OSDs? The wiki doesn't explain such details unfortunately.
 
@Time, I think there is little bit of misunderstanding the way you have understood CEPH. There is no option to "Turn On" the HA ability of CEPH. By design it is HA itself. The moment you bring 2 OSDs online, thats when HA begins. If you meant that you are going wait for the node 1 to be completely full before you add the 2nd node, i would say it is a very bad idea. Although the HA begins when you bring two OSDs online even on the same node, if the node dies obviously it will bring your CEPH storage to halt. But with 2 nodes, if one node fails you have the other node to fall back on. If you are at learning phase, it is not a big issue to start with one node CEPH. But if you are planning to put it in production without a minimum of 2 nodes, you will face major issue and the users wont be happy.

If you are using the latest Proxmox CEPH server 3.2, you still need minimum 2 nodes to give Proxmox quorum. Now that Proxmox and CEPH can coexist on same hardware, 3 node setup is really the way to go even if you think from cost point of view. If cost is really really big issue, just start with i7 CPU platform. Instead of spending all money on 1 single server class machine, buy 3 desktop class i7 machines and give both Proxmox and CEPH a chance to do what they do best.
 
  • Like
Reactions: HE_Cole
Thanks for your reply.
But if you are planning to put it in production without a minimum of 2 nodes, you will face major issue and the users wont be happy.
Apart from non-existent HA or redundancy, why exactly would it be so bad to start with Ceph on just 1 server? I would add a second one as soon as the first one reaches ~30-50%. Unfortunately my question about how to adjust the pool settings if more servers are being added to the custer hasn't been answered yet. Whether I start with 1 server, with 2 server or with 3 servers - how can I adjust the pool settings or do I even have to? The Promox wiki and other sources always instruct to calculate the pg_num in relation to the number of OSDs being used. Is that really the case or can I use the default Ceph pool (not sure what it's called right now) with 2 or 4 servers without adjusting pg_num or anything? And yes, I'm completely new to both Ceph and Proxmox, even though I'm working in the industry for the last 5 years, just not with a storage like Ceph and HA yet, so it's possible that I really don't understand some of the basic concepts. My initial idea to cut costs was:2x SSD drives, one for Proxmox and monitors and one for Ceph journal2x SATA III drives for OSDsFrom the beginning (ie. with just one server) I would use Ceph as storage, to avoid moving VMs from local storage to Ceph later. When this setup reaches about 30% usage, I would add a second server with the same specs, adjust the pg_num and pool settings to get a copy of the same data on both servers to have redundancy. Then, as soon as this setup has a few more VMs, I would finally add a third server, adjust the pool settings again and enable fencing and HA in Proxmox and after that probably add a 4th or 5th server or just build a second cluster again with 3 servers. Would this make any sense or have a major downside apart from the fact that the setup isn't redundant in the beginning? Is it possible or even necessary to adjust the pool settings with a changing/increasing number of nodes?
 
Thanks for your reply.Apart from non-existent HA or redundancy, why exactly would it be so bad to start with Ceph on just 1 server? I would add a second one as soon as the first one reaches ~30-50%. Unfortunately my question about how to adjust the pool settings if more servers are being added to the custer hasn't been answered yet. Whether I start with 1 server, with 2 server or with 3 servers - how can I adjust the pool settings or do I even have to? The Promox wiki and other sources always instruct to calculate the pg_num in relation to the number of OSDs being used. Is that really the case or can I use the default Ceph pool (not sure what it's called right now) with 2 or 4 servers without adjusting pg_num or anything? And yes, I'm completely new to both Ceph and Proxmox, even though I'm working in the industry for the last 5 years, just not with a storage like Ceph and HA yet, so it's possible that I really don't understand some of the basic concepts. My initial idea to cut costs was:2x SSD drives, one for Proxmox and monitors and one for Ceph journal2x SATA III drives for OSDsFrom the beginning (ie. with just one server) I would use Ceph as storage, to avoid moving VMs from local storage to Ceph later. When this setup reaches about 30% usage, I would add a second server with the same specs, adjust the pg_num and pool settings to get a copy of the same data on both servers to have redundancy. Then, as soon as this setup has a few more VMs, I would finally add a third server, adjust the pool settings again and enable fencing and HA in Proxmox and after that probably add a 4th or 5th server or just build a second cluster again with 3 servers. Would this make any sense or have a major downside apart from the fact that the setup isn't redundant in the beginning? Is it possible or even necessary to adjust the pool settings with a changing/increasing number of nodes?
Hi,
ceph is an distributed object store - with only one node it's hard to distribute...
OK - you can told ceph to replicate on OSD-base and not on host-base, but this normaly nobody want! If you extend your ceph-nodes with more osd-nodes, and you forget to switch to host-based redundency you will allways have an SPOF (single point of failure).
And due quorum ceph make no sense with less than 3 nodes! Even with 2 nodes your storage - and in thic case your VMs either - are stopped by an problem on one node.

Storage migration with pve is quite easy - simply use now local storage (reduncy with hardware-raid) and later migrate on the fly the VM-disks to ceph.

Udo
 
Thank you for your reply, Udo. So you're suggesting that I should use local storage until I hit 3 nodes. I could leave one SSD and one HDD idle per server, use one SSD for Proxmox and one HDD as local storage and later, when I have 3 active nodes, use the idle SSD for Ceph journal and the idle HDD as OSD, then move the VMs from local storage to Ceph and zap the HDDs that were used as local storage before and add them as OSDs too. That would have been the other approach I was thinking about. However, my questions about the Ceph pool configuration remains unanswered and it's not described in your Wiki: What if I want to add a 4th or 5th server to the cluster? Do I have to adjust the pool settings (pg_num, etc.)?
 
Thank you for your reply, Udo. So you're suggesting that I should use local storage until I hit 3 nodes. I could leave one SSD and one HDD idle per server, use one SSD for Proxmox and one HDD as local storage and later, when I have 3 active nodes, use the idle SSD for Ceph journal and the idle HDD as OSD, then move the VMs from local storage to Ceph and zap the HDDs that were used as local storage before and add them as OSDs too. That would have been the other approach I was thinking about. However, my questions about the Ceph pool configuration remains unanswered and it's not described in your Wiki: What if I want to add a 4th or 5th server to the cluster? Do I have to adjust the pool settings (pg_num, etc.)?

Hi,
I guess at the time you upgrade to 3 nodes you don't need an ssd for journal, because with the upcomming ceph version firefly there is no need for an journal disk.

Because "your Wiki": the wiki is for all! And PGs has to adjust to the numer of osds - don't has to do directly with numbers of servers.


Udo
 
Hi,I guess at the time you upgrade to 3 nodes you don't need an ssd for journal, because with the upcomming ceph version firefly there is no need for an journal disk.Because "your Wiki": the wiki is for all! And PGs has to adjust to the numer of osds - don't has to do directly with numbers of servers.Udo
So it would be better to wait until the firefly Ceph release is integrated into Proxmox, so I don't have an SSD that I don't need anymore?Of course, if all my questions are answered, I could extend the Ceph Server wiki page to better elaborate on different scenarios / cluster setups, but my question on how to adjust the Ceph pool settings and pg_num of an existing pool remains. With a new server I have to add new OSDs and with new OSDs I would have to adjust the pool settings. So how do I change them with an existing pool?
 
How many spindles are you planning to use for Ceph? Surely more than one? You can start with one spindle per OSD, with the OSDs run under Proxmox.
 
with the upcomming ceph version firefly there is no need for an journal disk.

I am getting ready to build a CEPH cluster and have been doing some research on journals in firefly.
My understadnding (which may be flawed) is that with firefly journals are not needed but can still be used to help with some performance aspects.

It seems like an SSD journal + mechanical disk would still perform better in firefly than just a mechanical disk with no journal.
But if someone had a RAID card with BBU write cache then using no journal would perform best.
 
@Time, It is somewhat unclear of many nodes in total (Proxmox and CEPH) you are going to have in your network. You said you wanted to start with 1 CEPH node, does that mean you have planned other nodes for Proxmox usage only? With Proxmox VE 3.2 both Proxmox and CEPH can co exist on same hardware. So if you start with only 3 nodes that gives you quorum and redundancy for both Proxmox and CEPH. Based on this your node setup will look like this:
Node 1
======
Platform : AMD / Intel
SSD: 1 (For Proxmox Installation)
HDD: 2 (For CEPH OSD+Journal)

Node 2
======
Platform : AMD / Intel
SSD: 1 (For Proxmox Installation)
HDD: 2 (For CEPH OSD+Journal)

Node 3
======
Platform : AMD / Intel
SSD: 1 (For Proxmox Installation)
HDD: 2 (For CEPH OSD+Journal)

No matter how mnay node you want to setup for CEPH, you still need more than one node for proxmox unless you were planning to have 1 node proxmox cluster. With this setup you give both Proxmox and CEPH do what they do best without spending lots of money.
 
How much RAM should be assigned for CEPH?

Lets say we have a node configured with 16GB RAM for VM's whould adding 4 GB more RAM be sufficient for supporting a CEPH node as well?
 
On day to day basis CEPH do not use large amount of memory. On a node with 10 OSDs i have not seen it go beyond 1.5GB of RAM. But when CEPH goes into re balancing mode due to OSD failure, pg change etc thats when it utilizes larger amount of RAM. Still i did not see it hit above 6 GB. For small CEPH node i believe 4gb would be sufficient.
 
On day to day basis CEPH do not use large amount of memory. On a node with 10 OSDs i have not seen it go beyond 1.5GB of RAM. But when CEPH goes into re balancing mode due to OSD failure, pg change etc thats when it utilizes larger amount of RAM. Still i did not see it hit above 6 GB. For small CEPH node i believe 4gb would be sufficient.
Hi,
4GB looks to less for me. My osd-hosts use app. 7GB and 24GB cached (13 osds).
You should have app. 1GB Ram for each osd. Normaly only 512MB is used, but for recovery you need 1GB for each TB!

See here: http://ceph.com/docs/master/start/hardware-recommendations

Udo
 
"A general rule of thumb is ~1GB of RAM for 1TB of storage space." So assigning 4 GB RAM means 4 TB of data. Given the fact that more disks is better than few disks this would mean the optimal for a PVE+CEPH node given 4 GB RAM for CEPH:2 OS disk raid 1 and 4 * 1 TB disks for CEPH.A 6 bay back plane in each server.
 
Last edited:
Here is a screenshot of one of the CEPH node with 10 OSDs on day to day basis. No VM running on this node.
ceph-ram.png
The node is in production environment with work load 5 days a week used by about 40 users along with several email servers and database servers running 24x7.

The 1GB RAM per TB is very true specially in a cluster where both RBD and CephFS(MDS) are present. More RAM is always better no question about that. But 4GB could handle a small CEPH say 5 OSDs per node if somebody must do it with 4GB of RAM.

The node in the image came from separate CEPH cluster when CEPH was not included in Proxmox. It had 4GB and 6OSDs for months. Its only after i converted it to Proxmox+CEPH that i added extra 4GB since now it also has to do Proxmox duties.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!