Ok, I'm going to be setting up an experiment group at my work for learning/using Ceph within Proxmox. I'm planning on having a dedicated proxmox cluster to act as the Ceph storage systems (and a secondary cluster to act as the vm hosts). Most of the tutorials cover the basic setup of a proxmox ceph cluster but not much else.
My questions so I can understand how to test this are:
1) I understand when I set up the ceph cluster, I'll have a certain amount of ODS', PG's and replicas (not including CRUSH map (I'll only have one rack for storage, so I won't be on multiple levels in a CRUSH map), management nodes of which all three will be, etc.). For this test network, there will be 3 nodes, each node will have 2HDD, so a total of 6HDD - 6 OSDs with a replication of 3 - so through the formula (6*100)/3 = 200pg - ok no problem. My question now lies, if I need to increase the OSD's to increase my total capacity (say I add 2 more nodes, each of those containing 2 more OSDs) - once I create the OSDs and they show up in my manager, do I need to change the pool settings? Can I even change the pool settings? I mean, now it should be (10*100)/3 = 334pg (333.33, but I remember reading somewhere to round up if you end up without a whole number)...? Basically, I'm confused on the proper process to add storage to an existing proxmox ve managed ceph cluster
2) The reverse of number 1 - let's say a HDD fails completely and I need to remove it from the equation; I'm sure there is a proper procedure for this through proxmox, but it may instead need to be handled by command line (which is fine), but I'm not sure about the procedure I need to follow.
3) And finally (I think ) let's say the proxmox ceph cluster nodes need to be updated - I know to maintain quorum, I need to only allow 1 to update and restart at a time; now I'm assuming after a restart of a single node, the cluster will be in an unhealthy status (as vms will have been running and data still being written to the other two nodes/rest of the OSD's) - when I bring the first updated system back online will it self heal the cluster? or will I manually need to force it to repair? What kind of time are we talking about before the restarted node is fully functional? (does it just sync the changed data or will it do a complete rebuild where I'll need to figure the time based upon the amount of data and the network speed (I found the formula once, but can't find the link all of a sudden))
I'm sorry for what must seem somewhat basic questions, but most tutorials I've come across when it comes to proxmox ve management of a ceph cluster only really describe the initial creation of a cluster - I'm still working my way through the actual ceph.com documentation (ie: I'm very new to ceph) , but I'd like to begin some initial testing on real hardware to get a better feel for it.
My questions so I can understand how to test this are:
1) I understand when I set up the ceph cluster, I'll have a certain amount of ODS', PG's and replicas (not including CRUSH map (I'll only have one rack for storage, so I won't be on multiple levels in a CRUSH map), management nodes of which all three will be, etc.). For this test network, there will be 3 nodes, each node will have 2HDD, so a total of 6HDD - 6 OSDs with a replication of 3 - so through the formula (6*100)/3 = 200pg - ok no problem. My question now lies, if I need to increase the OSD's to increase my total capacity (say I add 2 more nodes, each of those containing 2 more OSDs) - once I create the OSDs and they show up in my manager, do I need to change the pool settings? Can I even change the pool settings? I mean, now it should be (10*100)/3 = 334pg (333.33, but I remember reading somewhere to round up if you end up without a whole number)...? Basically, I'm confused on the proper process to add storage to an existing proxmox ve managed ceph cluster
2) The reverse of number 1 - let's say a HDD fails completely and I need to remove it from the equation; I'm sure there is a proper procedure for this through proxmox, but it may instead need to be handled by command line (which is fine), but I'm not sure about the procedure I need to follow.
3) And finally (I think ) let's say the proxmox ceph cluster nodes need to be updated - I know to maintain quorum, I need to only allow 1 to update and restart at a time; now I'm assuming after a restart of a single node, the cluster will be in an unhealthy status (as vms will have been running and data still being written to the other two nodes/rest of the OSD's) - when I bring the first updated system back online will it self heal the cluster? or will I manually need to force it to repair? What kind of time are we talking about before the restarted node is fully functional? (does it just sync the changed data or will it do a complete rebuild where I'll need to figure the time based upon the amount of data and the network speed (I found the formula once, but can't find the link all of a sudden))
I'm sorry for what must seem somewhat basic questions, but most tutorials I've come across when it comes to proxmox ve management of a ceph cluster only really describe the initial creation of a cluster - I'm still working my way through the actual ceph.com documentation (ie: I'm very new to ceph) , but I'd like to begin some initial testing on real hardware to get a better feel for it.