Wrapping my head around CEPH - couple questions

nethfel

Member
Dec 26, 2014
151
0
16
Ok, I'm going to be setting up an experiment group at my work for learning/using Ceph within Proxmox. I'm planning on having a dedicated proxmox cluster to act as the Ceph storage systems (and a secondary cluster to act as the vm hosts). Most of the tutorials cover the basic setup of a proxmox ceph cluster but not much else.

My questions so I can understand how to test this are:

1) I understand when I set up the ceph cluster, I'll have a certain amount of ODS', PG's and replicas (not including CRUSH map (I'll only have one rack for storage, so I won't be on multiple levels in a CRUSH map), management nodes of which all three will be, etc.). For this test network, there will be 3 nodes, each node will have 2HDD, so a total of 6HDD - 6 OSDs with a replication of 3 - so through the formula (6*100)/3 = 200pg - ok no problem. My question now lies, if I need to increase the OSD's to increase my total capacity (say I add 2 more nodes, each of those containing 2 more OSDs) - once I create the OSDs and they show up in my manager, do I need to change the pool settings? Can I even change the pool settings? I mean, now it should be (10*100)/3 = 334pg (333.33, but I remember reading somewhere to round up if you end up without a whole number)...? Basically, I'm confused on the proper process to add storage to an existing proxmox ve managed ceph cluster

2) The reverse of number 1 - let's say a HDD fails completely and I need to remove it from the equation; I'm sure there is a proper procedure for this through proxmox, but it may instead need to be handled by command line (which is fine), but I'm not sure about the procedure I need to follow.

3) And finally (I think ;) ) let's say the proxmox ceph cluster nodes need to be updated - I know to maintain quorum, I need to only allow 1 to update and restart at a time; now I'm assuming after a restart of a single node, the cluster will be in an unhealthy status (as vms will have been running and data still being written to the other two nodes/rest of the OSD's) - when I bring the first updated system back online will it self heal the cluster? or will I manually need to force it to repair? What kind of time are we talking about before the restarted node is fully functional? (does it just sync the changed data or will it do a complete rebuild where I'll need to figure the time based upon the amount of data and the network speed (I found the formula once, but can't find the link all of a sudden))

I'm sorry for what must seem somewhat basic questions, but most tutorials I've come across when it comes to proxmox ve management of a ceph cluster only really describe the initial creation of a cluster - I'm still working my way through the actual ceph.com documentation (ie: I'm very new to ceph) , but I'd like to begin some initial testing on real hardware to get a better feel for it.
 
A total of 6 OSDs is not enough - this will make you unhappy. I suggest a minimum of 12 OSDs, and it is better to use SSD only for such small setups.
 
A total of 6 OSDs is not enough - this will make you unhappy. I suggest a minimum of 12 OSDs, and it is better to use SSD only for such small setups.

This is not a production setup - only a testing setup, so lower performance is not going to break my heart right now - once I have the testing environment going and I understand it better; then I will go beg for a budget to get a bigger configuration for production, but I don't want to start down that path until I understand my base questions above.
 
Ok, I'm going to be setting up an experiment group at my work for learning/using Ceph within Proxmox. I'm planning on having a dedicated proxmox cluster to act as the Ceph storage systems (and a secondary cluster to act as the vm hosts). Most of the tutorials cover the basic setup of a proxmox ceph cluster but not much else.

My questions so I can understand how to test this are:

1) I understand when I set up the ceph cluster, I'll have a certain amount of ODS', PG's and replicas (not including CRUSH map (I'll only have one rack for storage, so I won't be on multiple levels in a CRUSH map), management nodes of which all three will be, etc.). For this test network, there will be 3 nodes, each node will have 2HDD, so a total of 6HDD - 6 OSDs with a replication of 3 - so through the formula (6*100)/3 = 200pg - ok no problem. My question now lies, if I need to increase the OSD's to increase my total capacity (say I add 2 more nodes, each of those containing 2 more OSDs) - once I create the OSDs and they show up in my manager, do I need to change the pool settings? Can I even change the pool settings? I mean, now it should be (10*100)/3 = 334pg (333.33, but I remember reading somewhere to round up if you end up without a whole number)...?
Hi,
the amount of PGs in an pool and all PGs for an OSD is a little bit flexible.
First, the PGs in an pool should something like factor 2 (64,128,256,...). So with 6 OSDs and replica 3 you should start with 128 PGs. If you have multible pools your PGs should never above 300PGs for each OSD (new warning level in ceph 0.90) (much less!).
You can change the pool settings, but only increasing! If you need an smaller pool, you must create an new one and migrate the data.
2) The reverse of number 1 - let's say a HDD fails completely and I need to remove it from the equation; I'm sure there is a proper procedure for this through proxmox, but it may instead need to be handled by command line (which is fine), but I'm not sure about the procedure I need to follow.
First, if an OSD fails, ceph automaticly move all data to other OSDs (e.g. you need enough free space).
After that, you can remove the OSD and create an new one (got the same number) - sorry I'm not using pve-ceph so I know only the way with "pure" ceph. But I assume you can simply remove the down and out OSD and create an new one with the GUI?!
3) And finally (I think ;) ) let's say the proxmox ceph cluster nodes need to be updated - I know to maintain quorum, I need to only allow 1 to update and restart at a time; now I'm assuming after a restart of a single node, the cluster will be in an unhealthy status (as vms will have been running and data still being written to the other two nodes/rest of the OSD's) - when I bring the first updated system back online will it self heal the cluster? or will I manually need to force it to repair? What kind of time are we talking about before the restarted node is fully functional? (does it just sync the changed data or will it do a complete rebuild where I'll need to figure the time based upon the amount of data and the network speed (I found the formula once, but can't find the link all of a sudden))
The update process is a litle bit different (depends on the version - if the OSDs or MONs updated first). Normaly you update one node (all things are still running). If the OSDs should updated first: simply restart the OSDs on this node. If all is healthy again (happens automaticly), do the same on the next node and so on. After this restart the first MON, look is all healthy again and do the same on the other nodes.
The cluster is working during the whole process.
My ceph-cluster runs since over one year without real interuption (and with some updates/MON-changes/OSD-reformat/node-extensions).

Udo