Ceph and mismatched disks

arubenstein

New Member
Jul 17, 2023
27
0
1
I've been doing some research on CEPH using mismatch underlying hard drives. I've read (and I think it is fairly obvious) that you need to have symmetry at least node to node. So...

Let's assume that I have a cluster of three nodes, each node today has three 15 TB SSDs. Time goes on, we need some more space/performance, and I happen to have a bunch of 2 TB SSDs laying around. So, I stick three of those 2TB drives into each node, now each node has (3) 15 TBs, and (3) 2 TBs.

Is this just really a bad idea, or just keep on trucking?
 
Hi @arubenstein in that case I would create a second device class and another crushrule that uses this tag for example ssd-2. you would have 2 pools then, if thats ok for you.

  • ceph osd crush rule create-replicated replicated-ssd1 default host ssd1
  • ceph osd crush rule create-replicated replicated-ssd2 default host ssd

    Then you need to assing the specific rules to each pool by editing the pool via ui.​
 
Last edited:
  • Like
Reactions: sb-jw
Actually, my intent (and need) is to expand the existing pool.
Ceph would accept those 2 TB disks by setting a weight according to the size of the disk. But you need to be careful because you need to always make sure that if one device fails of a host (for example a 15TB SSD) that the rest of the disks available on that host need to be able to recover those data that was on that failed disk (for example 75% of 15TB). Why? Because if one osd gets 95% usage your ceph cluster will go offline. because of mon osd fullratio

After creating the osds, ceph will automatically use them. Make sure that they have similiar performance because the slowest disk in a pool will determine the ceph write performance. The general recommendation is to not mix sizes because it increases complexity.
 
Last edited:
  • Like
Reactions: sb-jw
In principle this is not a problem, but it is still not optimal.

But you definitely have to keep an eye on the fill levels of the individual OSDs. You may have to balance the OSDs individually so that there are no problems. When an OSD reaches full ratio, your CEPH is set to read-only, keep that in mind. Otherwise, please note that if your CEPH is already well filled, when adding additional OSDs it may happen that the already full OSDs temporarily receive more data.

You should therefore always reschedule early
 
  • Like
Reactions: jsterr
I concur on all the pointes above. I'd always make sure there is enough spare capacity per node for the largest disk to fail. I think that is a matter of normal cluster health/maintenance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!