Recently I combined two separate Proxmox clusters into one. Both clusters prior had separate Ceph clusters of three nodes each with 10 OSDs. Earlier this week I finally added the three nodes and OSDs to my converged cluster. All nodes are running Proxmox 8.1.11 (I see 8.2 is now available), with Ceph Reef.
What that means for Ceph is that now the cluster doubled in capacity from 3 nodes/30 OSDs to 6 nodes/60 OSDs. The rebalancing is still happening from that operation, but the misplaced OSDs is now below 10%.
In the meantime, I'm looking at better understanding PGs and such. After a lot of reading and recommendations, I've got a slightly better understanding of the basics of PGs and such, but there are still some questions I have.
After initial combining, I had this from the autoscale-status:
The warning I got in this state is that my number of PGs at 1024 was too high and should have been 256. However, I read enough to know that the autoscaler was basing that recommendation on the current usage and the lack of "guidance" on how big the pools should be (btw, all pools are 3/2).
The two main pools here are lab-vm-pool where VM images are stored and should take the majority of space. lab-cephfs-data is where mainly ISOs and other ancillary data is stored. As far as I can tell, the other two pools are auto-generated and are in the "storage noise" so to speak.
I then decided to set the ratios/sizes of the two partitions. Ideally I want the lab-vm-pool to be 90% of the available space, with everything else in the other 10%. To start, though I set the lab-cephfs-data pool to be 1000G, and tried to set the lab-vm-pool ratio to 90%. I didn't understand if I was supposed to put 90.0 or .9 in the data fields for that. I never found any concrete guidance and a few disparate sources used both scales. In the end, I edited the lab-vm-pool pool from the GUI and used the number 0.9. I then got this:
Instantly, the autoscaler switched warnings from too many PGs (1024 vs 256) to too few PGs (1024 vs 2048). I know that's based on future projected capacity and I don't really feel the need to adjust this now.
In the end, should I just remove all the ratio guidance and go back to having it blank? We've reached a somewhat static state for this storage cluster in that we have not really been increasing disk usage (if we do, it's usually due to a VM needing to expand a disk image). Our problem in the past has been performance, not capacity (part of the reason to add more OSDs is to try to spread operations across more devices. On average the number of PGs per OSD has dropped to around 20 (for completeness, all storage nodes are using two 100Gb links lagged together).
So the two questions asked in case you got lost:
What that means for Ceph is that now the cluster doubled in capacity from 3 nodes/30 OSDs to 6 nodes/60 OSDs. The rebalancing is still happening from that operation, but the misplaced OSDs is now below 10%.
In the meantime, I'm looking at better understanding PGs and such. After a lot of reading and recommendations, I've got a slightly better understanding of the basics of PGs and such, but there are still some questions I have.
After initial combining, I had this from the autoscale-status:
Code:
root@amazon:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
lab-vm-pool 14442G 3.0 235.5T 0.1797 1.0 1024 warn False
lab-cephfs_data 80902M 3.0 235.5T 0.0010 1.0 128 off False
lab-cephfs_metadata 209.3M 3.0 235.5T 0.0000 1.0 32 warn False
.mgr 217.3M 3.0 235.5T 0.0000 1.0 1 on False
The two main pools here are lab-vm-pool where VM images are stored and should take the majority of space. lab-cephfs-data is where mainly ISOs and other ancillary data is stored. As far as I can tell, the other two pools are auto-generated and are in the "storage noise" so to speak.
I then decided to set the ratios/sizes of the two partitions. Ideally I want the lab-vm-pool to be 90% of the available space, with everything else in the other 10%. To start, though I set the lab-cephfs-data pool to be 1000G, and tried to set the lab-vm-pool ratio to 90%. I didn't understand if I was supposed to put 90.0 or .9 in the data fields for that. I never found any concrete guidance and a few disparate sources used both scales. In the end, I edited the lab-vm-pool pool from the GUI and used the number 0.9. I then got this:
Code:
root@amazon:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
lab-vm-pool 14467G 3.0 235.5T 0.9876 0.9000 0.9876 1.0 1024 warn False
lab-cephfs_data 80902M 1000G 3.0 235.5T 0.0124 1.0 128 off False
lab-cephfs_metadata 209.3M 3.0 235.5T 0.0000 1.0 32 warn False
.mgr 217.3M 3.0 235.5T 0.0000 1.0 1 on False
In the end, should I just remove all the ratio guidance and go back to having it blank? We've reached a somewhat static state for this storage cluster in that we have not really been increasing disk usage (if we do, it's usually due to a VM needing to expand a disk image). Our problem in the past has been performance, not capacity (part of the reason to add more OSDs is to try to spread operations across more devices. On average the number of PGs per OSD has dropped to around 20 (for completeness, all storage nodes are using two 100Gb links lagged together).
So the two questions asked in case you got lost:
- For the target ratio, should it be expressed as the actual number (like 0.9) or as the percentage number (like 90)?
- Should I just not worry about any of this ratio mess, turn off the auto-scaler warnings, and go about my business?